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Abstract 

A  definite  certainty  in  the  future  of  Unmanned  Aircraft  Systems  is  demand.  The 
potential  capabilities  of  these  systems  far  exceed  those  of  manned  aircraft.  Without 
a  human  on-board,  aircraft  limitations  are  diminished  and  endurance  potential  in¬ 
creased.  An  endurance  limitation  that  persists  is  a  trait  inherent  to  all  aircraft:  the 
balance  between  performance  and  fuel  capacity.  The  ability  of  an  unmanned  aircraft 
to  refuel  in  flight  will  limit  the  impact  of  that  balance. 

Refueling  an  unmanned  aircraft  in  flight  is  an  engineering  challenge  that  has 
demanded  the  better  part  of  a  decade.  Some  successful  approaches  have  used  Differ¬ 
ential  Global  Positioning  System  (DGPS)  between  aircraft.  Optical  sensor  tracking 
has  shown  potential  as  a  viable  alternative,  or  augmentation,  to  DGPS  for  refueling 
unmanned  systems. 

This  research  investigates  the  feasibility,  accuracy,  and  reliability  of  a  predictive 
rendering  and  holistic  comparison  algorithm  with  the  use  of  an  optical  sensor  to  pro¬ 
vide  relative  distance  and  position  behind  a  lead  or  tanker  aircraft.  Using  an  accurate 
model  of  a  tanker,  an  algorithm  renders  image(s)  for  comparison  with  collected  images 
by  a  camera  installed  on  the  receiver  aircraft.  Based  on  this  comparison,  information 
used  to  create  the  rendered  image (s)  is  used  to  provide  the  relative  navigation  solution 
required  for  autonomous  air  refueling. 

Building  on  previous  work,  this  research  reduced  the  number  of  required  ren¬ 
dered  images  to  15  or  less  for  each  collected  image  while  requiring  no  modification 
to  the  tanker  aircraft.  The  accuracy  of  this  research  is  considered  good  enough  for 
autonomous  operations.  The  average  error  was  two  feet  or  less  at  distances  of  62.5 
feet  and  closer.  A  remaining  limitation  of  this  approach  is  the  length  of  time  to  cal¬ 
culate  a  measurement,  which  can  take  up  to  four  seconds.  Although  improvements 
are  warranted,  the  methods  presented  are  viable  for  autonomous  air  refueling. 
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Image  Dependent  Relative 
Formation  Navigation  eor 
Autonomous  Aerial  Reeueling 

I.  Introduction 

The  motivation  driving  this  research  is  to  expand  the  utility  of  Unmanned  Aircraft 
Systems  (UAS)  by  incorporating  passive,  vision-based  sensors  that  enable  the 
system  to  conduct  autonomous  aerial  refueling  (AAR).  The  Department  of  Defense 
(DOD),  who  operates  more  than  6,800  UAS,  has  come  to  rely  on  their  capabilities  in 
everyday  operations  [26].  The  use  of  a  vision-based  sensor  to  AAR  is  also  motivated 
by  the  DOD’s  emphasis  on  both  non-emissive  equipment  and  system  redundancy  (to 
other  on-board  equipment  that  can  enable  AAR),  ensuring  continued  operations  in 
combat  environments. 

Combat  commanders  use  UAS  on  the  battleheld,  because  they  signihcantly 
reduce  human  exposure  to  risk,  they  do  not  require  expenditures  for  life  support 
equipment,  and  they  are  not  limited  to  tolerances  of  the  human  body. 

Removing  human  physiological  needs  from  on-board  the  aircraft  eliminates  the 
endurance  limitations  previously  imposed  by  manned  flight.  The  UAS  can  potentially 
loiter  (remain  airborne)  indehnitely,  signihcantly  reducing  the  fuel  and  time  it  costs 
to  transit  from  an  operating  held  to  a  point  of  interest  and  back.  The  fuel  carried 
on  the  aircraft  is  currently  the  limitation  of  loiter  time.  A  UAS  aerial  refueling 
(AR)  capability  will  increase  their  endurance,  a  necessity  to  expanding  their  utility. 
Transmission  delays,  limited  control  hdelity,  and  inadequate  feedback  between  the 
system  operator  and  the  system  aircraft  currently  limit  safe  AR  operations  with  UAS. 
Accomplishing  AR  autonomously,  using  on-board  sensors,  is  a  more  probable  and 
desirable  alternative  method  to  a  system-operator,  manually-controlled  UAS  AR. 
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Currently  the  Air  Force  Research  Laboratory  (AFRL)  is  researching  and  advanc¬ 
ing  the  science  of  AAR.  The  bulk  of  that  research  is  aimed  at  utilizing  Differential 
Global  Positioning  System  (DGPS)  to  determine  the  relative  position  between  a  re¬ 
fueling  platform  (tanker)  and  the  UAS.  This  research  is  ongoing  and  testing  with  the 
actual  transfer  of  fuel  is  forthcoming.  DGPS  is  a  proven  navigational  tool  and  pro¬ 
vides  adequate  precision  to  AAR,  but  could  be  subject  to  limitations  during  wartime 
conditions,  such  as  Global  Positioning  System  (GPS)  jamming  or  spoohng.  As  a  re¬ 
sult,  AFRL  is  currently  exploring  other  sensors  to  augment  and  provide  dissimilar 
system  redundancy,  leading  to  the  secondary  motivation  for  this  research. 

The  DOD  has  a  vested  interest  to  make  certain  its  systems  have  an  inherent 
redundancy  to  ensure  continued  operation  despite  degradation  to  the  system’s  sensors 
or  equipment.  A  combination  of  solutions  including  DGPS,  integrated  navigation 
solutions,  and  other  measurement  devices  provide  the  desired  redundancy  and  ensure 
both  the  survivability  of  the  UAS  and  its  ability  to  conduct  AAR.  Additionally,  the 
DOD  places  an  emphasis  on  passive  sensors  (sensors  that  do  not  emit  any  electro¬ 
magnetic  energy)  to  reduce  the  likelihood  of  detection  when  operating  in  unfriendly 
areas  of  interest.  This  would  hinder  information  sharing  between  the  aircraft  unless 
transmissions  could  be  limited  in  scope,  duration,  or  power.  Finally,  minimizing 
the  number  of  modihcations  to  the  existing  tanker  fleet,  such  as  targets,  lights,  or 
transmission  devices,  would  dramatically  reduce  the  helding  costs  as  well  as  long 
term  operational  and  maintenance  costs.  A  low  emissive,  cheap  alternative,  that  is 
dissimilar  to  DGPS,  characterizes  a  vision-based  AAR  approach. 

The  ability  of  an  UAS  to  achieve  AAR  using  passive  on-board  sensors  will 
enhance  the  UAS’  ability  to  operate  for  longer  periods  of  time  independent  of  external 
signals  (such  as  GPS  or  broadcasted  information  from  other  airborne  systems  or 
platforms).  AAR  using  vision-based  capabilities  and  other  position  and  orientation 
estimation  (pose)  equipment  is  a  capability  that  will  aid  the  DOD. 
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1.1  Problem  Definition 

Even  with  the  current  knowledge  base  and  computer  processing  capabilities,  it  is 
difficult  to  attain  precise  navigation  using  vision  information  alone.  The  information 
provided  from  a  typical,  non-modihed  picture  is  limited  to  two-dimensions.  It  is  pos¬ 
sible  to  determine  in  what  direction  a  specihc  reference  point  is  located,  but  without 
additional  information,  range  to  that  point  is  not  immediately  available.  Mathemat¬ 
ically  based  algorithms  can  process  collected  images  and  infer  a  relative  position  of 
the  point.  The  science  of  image-aided  navigation  is  ever  expanding  and  the  required 
precision  is  currently  possible  with  enough  time,  information,  and  processing  of  the 
image  (s). 

The  critical  problem  is  a  method  of  real-time,  accurate  pose.  Real-time  estima¬ 
tion  allows  incorporation  of  a  navigation  solution  into  an  autopilot  response.  Accurate 
estimation  introduces  fewer  errors  in  the  solution  and  ultimately  in  the  autopilot  re¬ 
sponse.  Both  real-time  and  accurate  estimations  are  necessary  to  conduct  safe  UAS 
AAR  operations. 

There  are  currently  two  methods  employed  by  the  DOD  to  accomplish  AR.  The 
U.S.  Air  Force  preferred  method  uses  a  refueling  boom  attached  to  the  rear  of  the 
tanker  aircraft.  The  boom  is  controlled  with  flight  control  surfaces  attached  to  it. 
To  refuel  using  this  method,  the  receiver  pilot  flies  the  aircraft  to  within  a  dehned 
refueling  envelope  surrounding  the  refueling  boom.  A  boom  operator  on  the  tanker 
aircraft  then  flies  the  boom  into  a  receiver  port  on  the  receiver  aircraft.  Both  pilot 
and  boom  operator  are  responsible  for  the  refueling  connection. 

The  U.S.  Navy  prefers  the  probe  and  drogue  method  of  air  refueling.  Instead  of 
a  boom,  this  method  uses  a  drogue:  a  basket  at  the  end  of  a  fuel  hose  that  extends 
behind  the  tanker  aircraft.  The  receiver  aircraft  has  a  probe  that  extends  into  the 
wind  stream  and  mates  with  the  drogue.  The  tanker  aircraft  crew  has  no  visibility  of 
the  location  and  movements  of  the  receiver  aircraft.  The  pilot  on  the  receiver  aircraft 
is  completely  responsible  for  making  the  refueling  connection.  The  basket  on  the  end 
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of  the  drogue  allows  the  pilot  some  margin  for  error  and  guides  the  probe  into  the 
drogue. 

Both  methods  have  strengths  and  weaknesses,  and  some  aircraft  have  both  a 
receiver  port  for  a  boom  as  well  as  a  probe  for  a  drogue.  Heavy  aircraft  (aircraft 
certihed  to  take  off  weighing  more  than  255,000  pounds)  do  not  normally  refuel  with 
the  probe  and  drogue  method  because  of  their  lack  of  maneuverability.  The  U.S.  Navy 
does  not  operate  large  tanker  aircraft,  or  many  large  receiver  aircraft.  Their  aircraft 
are  generally  too  small  to  accommodate  the  bulky  and  heavy  equipment  required  for 
boom  refueling. 

Demonstrations  and  research  of  AAR  with  UAS  vary  between  the  two  methods, 
and  currently  some  UAS  are  being  built  with  equipment  for  both  [28].  As  of  this 
writing,  no  tanker  has  yet  to  transfer  fuel  to  an  unmanned  aircraft. 

Flying  aircraft  in  close  proximity  to  each  other  is  an  inherently  dangerous  op¬ 
eration.  Human  operators  require  considerable  training  to  perform  adequate  AR 
formation  maneuvers  and  incidents  between  a  tanker  and  receiver  continue  to  occur. 
An  AAR  capability  for  an  UAS  will  necessarily  be  complex  to  ensure  safe  operations 
at  all  times. 

Even  when  fully  implemented,  human  operators  will  still  be  integral  to  the 
AAR  process,  just  as  they  are  for  all  UAS  activities.  They  will  have  oversight  of  the 
refueling  and  provide  safety  measures  until  the  reliability  of  the  AAR  system  process 
is  adequately  determined. 

To  limit  the  focus  of  this  research,  the  author  has  assumed  a  boom-refueling 
method  and  that  a  receiver  aircraft  maintaining  a  position  within  the  envelope  with 
less  than  Eve  feet  of  error  will  be  able  to  perform  AAR.  This  research  will  not  in¬ 
vestigate  autopilots  or  controllers  to  fly  a  receiver  aircraft.  Instead,  the  research  will 
focus  on  the  navigation  solution  that  will  eventually  allow  an  autopilot  to  maneuver 
the  aircraft  into  the  envelope. 
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At  a  minimum,  it  is  assumed  that  at  least  one  camera  is  on-board  the  receiver 
aircraft  and  it  has  an  unobstructed  view  of  the  tanker  (minor  discrepancies  from 
limited  dirt  and  grime  on  the  lenses  and  coverings  are  acceptable).  Estimating  an 
initial  relative  position  between  the  UAS  and  refueling  platform  will  be  possible  with 
the  method  researched  here,  but  is  not  a  focus  area.  To  demonstrate  the  AAR  ca¬ 
pability,  the  research  assumes  that  an  initial,  relative  position  between  the  aircraft 
is  known.  Finally,  no  computations  will  be  completed  on-board  the  aircraft  and  all 
determinations  of  potential  solutions  will  be  post-processed. 

1.2  Research  Solution 

This  preliminary  background  has  demonstrated  that  a  need  exists  for  a  dis¬ 
similar  approach  to  AAR  to  augment  the  DGPS  approach  or  to  serve  as  a  backup. 
The  method  presented  in  this  thesis  will  use  collected  images  of  a  tanker  as  well  as 
computer-rendered  simulated  images.  The  camera,  installed  on  the  receiver  aircraft 
looking  up  at  the  tanker  aircraft,  collects  images  throughout  the  refueling  process. 

Multiple  renderings  of  a  three-dimensional  tanker  model  permit  a  comparison 
between  a  collected  image  and  rendered  images.  This  comparison  determines  a  match¬ 
ing  likelihood  of  each  rendered  image.  The  information  used  to  create  the  most  likely 
image  updates  a  Kalman  hlter  that  tracks  the  position  of  the  tanker.  This  approach 
builds  on  many  successful  methods  [5, 15, 18,  23,  27,  29,  30]  that  use  similar  vision- 
dependent  techniques  to  estimate  the  position  of  a  vehicle.  The  method  outlined  in 
this  thesis  has  four  specihc  goals  to  overcome  some  of  the  problems  discovered  in  the 
previous  efforts. 

The  hrst  goal  is  to  decrease  the  average  time  required  to  determine  a  solution. 
Solutions  presented  in  the  cited  research  range  from  hundredths  of  a  second  to  30 
seconds  for  each  pose  determination.  A  real-time  approach  during  AR  would  need 
updates  consistently  less  than  one  second  apart. 
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Increasing  the  accuracy  of  this  approach  with  the  use  of  inertial  navigation  so¬ 
lutions  and  Kalman-hltering  is  the  second  goal.  An  accurate  solution  requires  average 
pose  errors  of  hve  feet  or  less  [16]. 

The  third  goal  is  implement  the  solution  with  an  efficient  programming  language 
(e.g.,  C)  as  quickly  as  possible,  requiring  the  use  of  open-source  libraries.  The  use 
of  open-source  libraries  dramatically  reduces  the  time  to  develop  a  solution.  The 
libraries  are  efficient  and  provide  many  tools  that  would  otherwise  require  extensive 
development  and  validating. 

A  hnal  goal  to  prove  this  approach  as  a  viable  alternative  is  to  accomplish  the 
three  previous  goals  without  the  need  for  any  modihcation  to  the  tanker  aircraft. 
Achieving  this  last  goal  will  increase  the  utility  of  this  solution  by  minimizing  the 
cost  required  for  implementation. 

1.3  Thesis  Outline 

This  thesis  is  broken  into  six  chapters.  The  first  chapter  details  the  background 
motivating  the  research,  the  problem  dehnition,  and  the  general  approach  taken  to 
solving  the  problem.  Chapter  2  will  introduce  the  mathematical  description  of  key 
terms  and  relationships  used  throughout  the  paper.  Nomenclature  and  equations  will 
explain  some  of  the  basics  of  navigation,  lenses,  and  cameras.  Chapter  2  also  discusses 
the  programming  language,  the  open-source  libraries,  and  the  Kalman  hltering  used  in 
the  research.  Chapter  3  outlines  the  nature  of  AR  and  characterizes  it  not  only  for  this 
research  but  future  projects  and  explains  position  realization  for  formation  navigation. 
Chapter  4  introduces  the  concept  of  pose,  previous  work  addressing  pose,  and  this 
research’s  approach  to  AAR.  The  experimental  portion  of  the  research  is  presented 
in  Chapter  5,  including  laboratory  work  at  the  Air  Force  Institute  of  Technology 
(AFIT)  and  flight  research  at  the  United  States  Air  Force  Test  Pilot  School  (TPS)  at 
Edwards  AFB,  CA.  This  includes  setup,  collection,  and  validation  of  the  research’s 
approach  and  the  results  of  these  tests.  Finally,  Chapter  6  contains  conclusions  and 
recommendations  for  future  research. 
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II.  Navigation,  Programming,  and  Mathematical 

Background 


This  thesis  relies  on  accurate  navigational  relationships,  programming  that  cre¬ 
ates  real-world  representations  and  interpretations,  and  mathematical  relation¬ 
ships.  This  chapter  introduces  the  background  of  these  concepts  in  three  sections. 
The  hrst  presents  a  discussion  of  the  appropriately  used  nomenclature,  assumptions, 
and  concepts  used  in  navigation.  The  second  section  describes  the  two  C  program¬ 
ming  libraries  (collections  of  programming  resources)  used,  how  they  are  used,  and 
what  information  they  provide.  The  hnal  section  provides  an  overview  of  Kalman 
hltering. 

The  thesis  uses  the  following  mathematical  notation: 

•  Scalars:  Italic  type  (e.g.,  x)  represents  scalars. 

•  Vectors:  Bold  font,  lower  case  letters  (e.g.,  x)  represent  vectors. 

•  Matrices:  Bold  font,  upper  case  letters  (e.g.,  T)  represent  matrices,  except  for 
X,  Y,  and  Z  which  represent  the  axes  of  coordinate  frames. 

•  Estimated  Variables:  The  hat  character  (e.g.,  x)  denotes  an  estimate. 

•  Computed  Variables:  The  tilde  character  (e.g.,  x)  denotes  computed  vari¬ 
ables. 

•  Homogenous  Coordinates:  An  underline  (e.g.,  x)  denotes  homogeneous  co¬ 
ordinates. 

2.1  Navigation 

This  section  presents  many  aspects  of  navigation  and  wherever  possible,  ex¬ 
plains  them  with  images  and  represents  them  with  symbols.  This  is  done  to  help 
the  reader  understand  key  relationships  applicable  to  other  scientihc  disciplines.  Sec¬ 
tion  2.1.1  presents  the  fundamentals  of  position,  followed  by  the  reference  frames  used 
to  define  those  positions  in  Section  2.1.2.  Section  2.1.3  expands  those  fundamentals 
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to  conversions  and  transformations  between  frames  and  a  discussion  on  lenses  and 
cameras. 

2.1.1  Position.  In  navigation,  an  object’s  location  relative  to  a  coordinate 
system  and  reference  frame  determines  its  position,  denoted  as  column  vector  p. 
There  are  many  different  reference  frames  based  on  their  origin  and  rotation,  which 
are  explained  in  further  detail  in  Section  2.1.2.  Each  frame  has  two  or  three  axes, 
denoted  as  X/rame,  frame,  and  as  necessary,  Z frame-  For  these  three  symbols  only, 
a  subscript  following  the  axis  denotes  the  frame  it  is  associated  with.  A  position’s 
translation,  measured  along  each  one  of  these  axes,  determines  the  location  of  the 
position  referenced  in  that  specihc  frame.  This  distance  along  a  coordinate  axis  is 
denoted  as  x,  y,  and  as  necessary,  2:.  Together  they  are  the  coordinates  of  that 
position.  Positions  in  this  research  require  coordinates  that  are  annotated  in  both 
Cartesian  and  homogeneous  representation. 

Cartesian  coordinates  are  unique;  the  translations  along  a  reference-frame  axis 
are  hxed  for  the  instant  of  time  they  are  referenced.  The  following  notation  abbrevi¬ 
ates  a  position  referenced  with  Cartesian  coordinates: 

frame 

Px,  identifier 
frame 

Py.,  identifier 

frame 

identifier 

where  the  superscript  frame  denotes  the  coordinate  frame  of  reference  used  to  dehne 
the  position,  identifier  denotes  the  name  of  the  position,  and  denotes  a  transpose  of 
any  array  or  matrix.  When  obvious  which  point  is  being  referenced,  or  for  non-specihc 
points,  the  identifier  subscript  is  dropped. 

Homogenous  coordinates  contain  an  additional  scaling  term,  k.  The  basis  of 
homogenous  coordinates  allows  scaling  the  coordinates  for  projective  geometry  oper¬ 
ations.  A  scaling  that  reduces  the  k  coordinate  to  a  value  of  one  is  referred  to  as 
normalizing  it  and  is  accomplished  by  dividing  all  the  coordinates  by  the  value  of  k. 
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The  following  notation  abbreviates  a  three-dimensional  position  in  homogeneous 


coordinates: 


E 


kx 

frame 

^  X,  identifier 

frame 

ky 

frame 

■t-  y,  identifier 

identifier 

kz 

frame 

S-  z,  identifier 

k 

frame 

^  fc,  identifier 

(2^2) 


for  any  non-zero  value  of  k. 

Positions  referenced  in  two-dimensional  coordinate  frames  are  similar,  but  do 
not  include  the  2:  or  kz  term. 


2.1.2  Navigational  Reference  Frames.  A  reference  frame  introduces  the  idea 
of  an  observer  of  a  position.  Standing  at  the  origin  of  a  reference  frame  and  locating  an 
object  based  on  its  distance  along  the  axes  of  the  reference  frame  dehnes  the  position 
of  that  object  in  that  frame.  For  simplicity,  the  frames  in  this  thesis  are  orthogonal 
(all  axes  of  the  frame  are  perpendicular).  As  needed,  introductions  of  other  frames 
that  are  not  in  this  section  occur  throughout  the  thesis. 

2. 1.2.1  Inertial  Frames.  The  universally  true  inertial-frame  (I-frame) 
is  the  only  non-accelerating  frame  discussed.  Using  the  I-frame  would  unnecessarily 
complicate  aircraft  navigation  that,  by  dehnition,  is  limited  to  altitudes  relatively  close 
to  the  surface  of  the  Earth.  As  such,  a  locally  dehned  inertial  frame.  Earth-centered 
inertial  frame  (*-frame),  approximates  the  I-frame  for  the  short  durations  of  time 
associated  with  aircraft  navigation.  This  frame  is  dehned  with  an  origin  at  the  Earth’s 
center  of  mass,  the  Z  axis  through  the  North  Pole,  the  X  axis  pointing  at  the  vernal 
equinox,  and  the  Y  axis  completing  the  right-handed  orthogonal  system  along  the 
equatorial  line.  The  i-frame  does  not  rotate  with  the  Earth,  but  its  origin  translates 
with  the  Earth  as  it  orbits  the  Sun.  In  contrast,  the  I-frame  never  accelerates  or 
rotates. 
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2. 1.2. 2  Earth  Frame.  A  reference  frame  anchored  permanently  to 
ground-based  locations  on  the  Earth  permits  navigation  with  respect  to  the  surface 
of  the  Earth.  This  frame,  known  as  the  Earth-centered  Earth-hxed  frame  (e-frame), 
shares  its  Z  axis  with  the  *-frame  while  its  X  and  Y  axes  rotate  about  the  Z  axis  at 
the  same  rate  as  the  Earth.  The  location  of  these  axes  are  fixed  to  the  surface  of  the 
Earth,  with  the  X  axis  extending  from  the  center  of  the  Earth  through  the  intersection 
of  the  Prime  Meridian  and  the  Earth’s  equatorial  line  [13],  and  the  Y  axis  complet¬ 
ing  the  right-handed  orthogonal  system.  The  e-frame  components  used  were  those  of 
the  World  Geodetic  System  -  created  in  1984  (WGS  84),  as  dehned  by  the  National 
Geospatial-Intelligence  Agency  (NGA)  [13],  further  discussed  below.  Positions  refer¬ 
enced  in  this  frame  are  realized  in  spherical  coordinates  as  latitude,  longitude,  and 
height  above  ellipsoid  (HAE)  or  in  rectangular  coordinates  as  x,  y,  and  2:  translations. 
In  the  e-frame  these  are  more  commonly  referred  to  as  m,  n,  and  w. 

Many  regional  and  local  variations  of  the  Earth’s  surface  make  a  mathematical 
model  difficult  to  create  and  use.  The  dehnition  of  an  equipotential  surface  partially 
compensates  for  these  variations.  This  surface  is  perpendicular  to  the  local  gravity 
vector,  with  equal  gravity  magnitudes  throughout  [25].  The  shape  is  called  the  geoid 
and  approximates  the  mean  sea  level  (MSL)  across  its  surface.  The  dehnition  of  a 
geometric  surface  called  an  ellipsoid,  or  oblate  spheroid,  approximates  the  geoid  and 
permits  mathematical  computations  that  dehne  position  and  positional  relationship 
with  respect  to  the  actual  surface  of  the  Earth  or  the  geoid. 

A  datum  dehnes  a  reference  ellipsoidal  surface  for  geographic  regions  of  the 
Earth.  Dehning  the  ellipsoid  regionally  makes  use  of  an  average  MSL  that  minimizes 
the  altitude  deviations  from  the  reference  geoid  surface.  It  is  important  to  note  that 
when  discussing  an  object’s  position  on  the  Earth  it  is  impossible  to  tell  where  the 
precise  location  is  without  information  about  the  datum  used  to  dehne  that  location. 

In  order  to  make  world-wide  navigation  seamless  across  geographic  boundaries 
and  multiple  datum,  a  geodetic  datum  exists.  This  global  datum,  known  as  WGS  84, 
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serves  as  a  best  fit  for  many  local  systems,  enabling  reliable  world-wide  navigation. 
The  NGA  is  responsible  for  maintaining  the  model  of  the  ellipsoid  and  the  gravita¬ 
tional  model  needed  to  create  this  datum.  The  NGA  determines  and  tracks  precise 
position  of  several  world-wide  locations  that  influence  the  precision  of  the  model  [13]. 
This  tracking  allows  updates  to  the  model  based  on  changes  of  the  Earth  from  tectonic 
motion,  tides,  etc.  [13] 

2. 1.2. 3  Navigation  Frame.  The  navigation  frame  (u-frame)  has  its 
origin  located  on  a  platform  of  interest  (aircraft)  and  is  dehned  with  the  X  axis 
pointing  toward  true  north.  The  Z  axis  points  in  the  direction  of  the  gravitational 
pull  of  the  Earth  and  the  Y  axis  completes  the  right-handed  orthogonal  system.  This 
is  also  referred  to  as  north,  east,  down  or  NED  orientation.  This  frame  does  not 
rotate  with  the  platform.  Its  orientation  is  dependent  on  the  platform’s  location  with 
respect  to  the  e-frame.  The  i-frame,  e-frame,  and  n-frame  are  shown  in  Figure  2.1. 


Zi  Zg 


Figure  2.1:  The  i-frame,  e-frame,  and  n-frame.  The  Aframe  and  e- frame  both  have 
their  origin  at  the  center  of  the  Earth  and  the  n-frame  has  its  origin  on  the  platform 
of  interest  [27] . 
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2. 1.2. 4  Body  Frame.  Defining  the  orientation  of  an  airborne  platform 
using  heading,  pitch,  and  roll  requires  the  use  of  the  body  frame  (6-frame).  For  an 
aircraft,  the  6-frame  is  dehned  with  the  X  axis  projected  through  the  nose  of  the 
aircraft,  the  Z  axis  through  the  bottom  of  the  aircraft  and  the  Y  axis  completes 
the  right-handed  orthogonal  system,  generally  assumed  to  be  from  the  origin  out  the 
right  wing.  The  origin  can  be  arbitrarily  chosen.  Typical  choices  include:  the  center 
of  gravity  of  the  aircraft,  a  truth  collection  device,  or  an  inertial  sensing  unit.  This 
frame  rotates  in  conjunction  with  the  platform  on  which  it  is  dehned  and  is  shown 
in  Figure  2.2,  on  an  aircraft  in  a  formation.  The  subscript  W  on  the  frame  identiher 
denotes  the  aircraft’s  position  in  the  formation,  the  wing  aircraft,  further  explained 
in  Section  2.1.4.  The  cube  in  the  hgure  denotes  the  origin  of  this  aircraft’s  6-frame 
and  n-frame. 


Figure  2.2:  The  6-frame  of  a  wing  aircraft.  Denoted  as  6v(/-franie,  the  6vi/-frame  and 
rivy-frame  share  a  common  origin. 

2. 1.2. 5  Camera  Frame.  Referencing  a  position  with  respect  to  a  cam¬ 
era  requires  the  camera  frame  (cam-frame)  as  shown  in  Figure  2.3.  The  origin  of 
the  frame  is  located  at  the  optical  center  of  the  camera,  denoted  with  a  circle  in  the 
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Figure  2.3:  The  cam-frame.  The  camera  and  the  cam-frame  are  shown  as  installed 
on  an  aircraft’s  dashboard. 

hgure.  The  frame  is  dehned  with  the  Z  axis  out  of  the  front  of  the  lens.  The  X  axis 
projects  out  of  the  right  of  the  lens  and  the  Y  axis  projects  out  of  the  bottom  of  the 
lens.  This  frame  references  the  position  of  objects  in  the  camera’s  field  of  view  that 
are  projected  onto  images.  This  process  is  further  discussed  in  Section  2. 1.3.5. 

2. 1.2. 6  Image  Frame.  There  are  three  different  two-dimensional  frames 
associated  with  images  used  in  this  research  and  two  different  types  of  images.  The 
three  frames  are  the  ima^f e-frame,  the  GLimage-fiame,  and  the  CVimage-bame]  the 
latter  two  are  further  discussed  in  Sections  2.2.1  and  2.2.2.  The  symbol  I  represents 
an  image  and  the  two  types  of  images  are  rendered  and  collected.  The  symbol  Ir-  refers 
to  images  that  are  rendered  with  the  use  of  the  OpenGL  library,  further  detailed  in 
Section  2.2.1.  The  symbol  Ic  refers  to  images  that  are  collected  with  the  camera. 

The  optical  center  of  the  camera  is  shown  in  Figure  2.3  and  is  the  origin  of 
the  cam-frame.  The  cam-frame  origin  projected  onto  an  Ic  (along  the  Z  axis  of  the 
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cam-frame)  is  typically  close  to,  but  not  always,  the  center  of  the  Ic-  Referenced  to 
the  image-irsime,  translations  Xo  and  i/o  denote  the  location  of  this  projection,  known 
as  the  principal  point.  The  origin  and  axes  of  the  image-iiame,  the  location  of  the 
principal  point,  and  their  relation  to  a  generic  Ic  are  shown  in  Figure  2.4. 


Additionally,  Ic  images  and  potentially  1^  images  are  not  necessarily  square. 
Denoted  in  Figure  2.4  as  9^  [11],  a  skew  angle  is  the  angle  between  the  top  and  sides 
of  these  images  and  is  a  concern  when  it  is  not  90°.  ^ 
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Figure  2.4:  The  image-iiame.  The  origin  and  axes  of  the  frame  for  a  generic  Ic  are 
shown  with  the  principal  point  (projection  of  the  cam-frame  X  and  Y  axes  on  the 
Ic).  The  generic  Ic  has  a  width  of  N  pixels  and  a  height  of  M  pixels.  For  a  non-square 
Ic,  9s  represents  the  angle  between  the  top  and  left  side  of  the  Ic. 


The  origin  of  the  image-frame  is  in  the  top-left  corner  of  the  Ic,  eliminating 
negative  translation  values  (along  the  X  and  Y  axes)  and  the  need  to  have  access 
to  the  coordinates  of  the  principal  point  when  referencing  positions  on  the  Ic.  The 
origin  is  actually  0.5  pixels  up  and  0.5  pixels  to  the  left  of  the  top  left  corner  of  the 
Ic  [27],  allowing  the  center  of  the  hrst  pixel  to  be  annotated  as  shown  in  Figure  2.4. 

^The  subscript  s  is  added  to  avoid  confusion  with  6  associated  with  the  roll  angle  of  an  aircraft. 
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The  X  axis  of  the  image-frame  dehnes  the  pixel  location  to  the  right  of  the 
origin  and  the  Y  axis  dehnes  the  pixel  location  below  the  origin.  An  Ic  is  typically 
determined  by  dimensions  measured  in  both  a  standardized  unit  of  measurement  {cen¬ 
timeter,  millimeter,  inches,  etc)  as  well  as  camera-dependent  dimensions  measured  in 
pixels.  The  terms  W  and  H  denote  standardized  units  of  measurement  for  width  and 
height  respectively,  the  terms  N  and  M  denote  the  pixel  units  of  width  and  height 
respectively. 

This  section  covered  many  of  the  frames  used  throughout  this  thesis.  The  next 
section  explains  the  relationship  between  frames. 

2.1.3  Reference  Frame  Conversions.  There  is  a  limit  to  the  utility  of  under¬ 
standing  an  object’s  position  in  a  single  frame  of  reference  if  that  position  cannot  be 
referenced  in  other  frames  for  further  analysis  and  computation.  This  section  presents 
various  reference-frame  conversions  to  increase  the  utility  of  both  the  reference  frames 
and  positions. 

This  thesis  references  points  in  both  three-dimensional  and  two-dimensional 
space,  and  in  both  Cartesian  and  homogeneous  coordinates.  This  requires  the  abil¬ 
ity  to  convert  between  them  all.  The  reference-frame  conversion  between  Cartesian 
coordinate  systems  can  require  a  translation  and  a  rotation.  Translation  accounts 
for  differences  in  the  origins  of  the  frames.  Rotation  accounts  for  the  difference  in 
orientation  between  the  frames.  The  reference-frame  conversion  between  homogenous 
coordinates  requires  a  transformation  or  camera  matrix  accounting  for  both  origin 
and  orientation  differences.  Mapping  or  projecting  describes  these  transformations 
and  they  are  detailed  in  Section  2. 1.3. 5. 

Direction  Cosine  Matrices  (DCMs,  Section  2. 1.3.1)  and  Euler  Angles  (Sec¬ 
tion  2. 1.3.2)  convert  Cartesian  coordinates  of  a  position  referenced  in  one  frame  to 
Cartesian  coordinates  of  the  same  point  referenced  in  another  frame.  As  shown  in 
Figure  2.5,  the  actual  position  of  an  object  (a  camera)  does  not  move  during  the 
conversion.  The  difference  in  translations  along  each  axes  is  evident  by  orienting  the 
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n-frame  and  6-frame, 
origin 


Figure  2.5:  Reference  frame  conversion.  Referencing  a  point  in  a  different  frame, 
does  not  change  the  location  of  the  point.  The  6-frame  is  rotated  to  have  the  same 
orientation  as  the  n-frame,  in  the  right  side  of  the  image. 


two  frames  in  the  same  manner  (right  side  of  the  image). 


2. 1.3.1  Direction  Cosine  Matrix.  A  DCM  is  a  3x3  matrix  that  sim¬ 
plifies  the  rotations  from  one  reference  frame  to  another.  The  matrix  is  created  by 
expressing  each  axis  unit-vector  of  one  frame  as  vectors  with  respect  to  the  other 
frame.  Generally,  the  nomenclature  for  a  DCM  uses  the  symbol,  where  the 

superscript  immediately  following  designates  the  reference  frame  being  converted  to 
and  a  subscript  letter  immediately  following  designates  the  reference  frame  being 
converted  from. 

As  an  example,  consider  a  camera  with  a  known  position  ([-1,-|-1,-1]'^)  in  the 
n-frame,  ^cam  shown  in  Figure  2.6.  This  figure,  shows  the  position  of  the  camera 
referenced  in  both  frames. 

The  symbol  is  a  DCM  that  rotates  a  position  in  the  n-frame  to  a  position 
in  the  6- frame  (assuming  collocated  origins).  multiplied  by  the  position 
determines  the  location  of  the  camera  in  the  6-frame,  or  —  ^^uPcam- 
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n-frame  and  6-frame, 
origin 


PCAM 


Figure  2.6:  Reference  frame  conversion,  DCM.  A  point  with  a  known  location  in  the 
n-frame  is  rotated  to  be  dehned  in  the  6-frame.  The  6-frame  is  rotated  to  have  the 
same  orientation  as  the  n-frame,  in  the  right  side  of  the  image. 


Equations  (2.3)  and  (2.4)  expand  this  relationship 


for  the  dehned  frames  in 


Figure  2.6: 
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DCMs  are  by  dehnition  orthonormal  and  non-singular  and  have  the  following 
properties: 


Det{C^)  ^  |C^|  =  1 

ci  =  (cir  =  (c^- 

Cb  _ 

e 
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(2.5) 

(2.6) 
(2.7) 


2. 1.3. 2  Euler  Angles.  Euler  Angles  are  a  second  method  to  describe 
the  rotation  between  reference  frames  by  specifying  an  angle  of  rotation  in  a  single 
two-dimensional  plane  at  a  time.  By  projecting  a  three-dimensional  frame  onto  a  two- 
dimensional  plane  of  another  reference  frame,  the  XY  plane  for  example,  the  rotation 
between  the  two  frames  is  a  single  angle.  This  angle  is  the  hrst  rotation,  shown  as 
the  'ijj  rotation  in  Figure  2.7.  Projection  onto  a  second  plane  dehnes  another  angle 
and  then  again,  such  that  three  angles  ('0,6*,0)  are  attained. 


Figure  2.7:  Rotational  Euler  angles  for  a  n-frame  to  6-frame  conversion.  The  n-frame 
has  been  offset  from  the  6-frame  intentionally.  The  angles  are  determined  and  applied 
in  series  [20]. 

Euler  angles  have  some  limitations.  The  angles  are  computed  in  series  and  must 
be  utilized  in  the  same  order.  A  common  convention  and  the  one  used  in  this  research 
is  that  of  3-2-1  -  hrst  the  XY  plane  (yaw),  then  the  XZ  plane  (pitch),  and  hnally  the 
YZ  plane  (roll).  If  the  same  order  is  not  maintained  throughout,  attitude  errors  will 
occur. 

For  n-frame  to  6-frame  conversions,  the  Euler  Angles  are  dehned  as  (in  order  of 
3-2-1): 

•  'ip  =  rotation  about  the  Z;,  axis  (heading  or  yaw) 

•  6  =  rotation  about  the  Y;,  axis  (pitch) 

•  (j)  =  rotation  about  the  X;,  axis  (roll) 
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In  this  thesis,  the  terms  heading  and  yaw  are  interchangeable.  A  positive  Euler 
angle  corresponds  to  a  positive  rotation  about  that  axis  dehned  by  the  right-hand 
rule.  The  rotation  described  by  the  example  in  Section  2. 1.3.1  was  a  roll  rotation  of 
180°. 

Conversions  also  exist  between  DCMs  and  Euler  angles.  The  following  in-matrix 
computation  uses  the  Euler  angles  to  create  the  &-frame  to  n-frame  DCM,  [25]. 


C 


n  _ 

h  — 


cos  Ip  cos  6  cos  Ip  sin  6  sin  (p  —  sin  ip  cos  (p  cos  ip  sin  9  cos  (p  +  sin  ip  sin  (p 
sin  Ip  cos  9  sin  ip  sin  9  sin  (p  +  cos  ip  cos  (p  sin  ip  sin  9  cos  (p  —  cos  ip  sin  cp 
—  sin  9  cos  9  sin  6  cos  9  cos  (p 


(2.8) 

The  substitutions,  =  0°,  6^  =  0°,  and  (p  =  180°  result  in  the  same  DCM  (C^ 
or  C[[,  equivalent  in  that  example  through  the  relationship  in  Equation  (2.6))  shown 
in  Equation  (2.3).  With  a  known  DCM,  the  following  computations  determine  the 
Euler  angles: 


-1 


=  sm 


-1 


Ip  =  sin 


-1 


(c;(3,i)) 

(2.9) 

CJ(3,2)\ 
cos(6*)  ) 

(2.10) 

C^(2,l)\ 

cos(6*)  / 

(2.11) 

where  Cj  (qj)  represents  the  coefficient  in  the  Ah-row  and  jth-column  of  the  DCM. 

From  these  equations,  a  singularity  occurs  when  9  (pitch)  is  close  to  or  equal  to 
±90°  (cos(±90°)  =  0,  causing  Equations  (2.10)  and  (2.11)  to  be  undehned).  This  is 
not  a  concern  in  this  research  because  an  aircraft  at  this  attitude  would  be  completely 
vertical;  the  test  flights  did  not  put  the  aircraft  in  this  condition. 

When  the  origins  of  the  reference  frames  are  not  collocated,  the  translation 
between  frames  is  performed  prior  to  the  rotations  previously  described. 


19 


The  following  equation  represents  an  e-frame  to  n-frame  conversion: 


V'CAM  —  ^NAv\  (2-12) 

where  v'nav  location  of  the  new  coordinate  system  with  respect  to  the  former, 

in  this  case  the  origin  of  the  n-frame  as  it  is  located  in  the  e-frame. 

The  transformations  and  projections  of  homogenous  coordinates  expand  on  the 
rotation  and  translation  operations  presented  between  Cartesian  coordinates.  The 
next  section  presents  the  transformation  matrices  describing  this  process. 

2. 1.3. 3  Transformation  Matrix.  A  transformation  between  coordi¬ 
nates  is  a  linear  relationship,  represented  in  this  thesis  by  a  matrix,  T.  The  DCMs 
presented  in  Section  2. 1.3.1  are  a  special  type  of  3x3  transformation. 

Transformation  matrices  permit  general  mapping  and  projections  of  points  be¬ 
tween  coordinate  frames.  Beyond  the  specihc  rotational  transformation  of  a  DCM,  a 
general  transformation  matrix  can  scale  or  shear  points  in  addition  to  rotating  them. 
The  term  perspective  projection  is  a  specihc  type  of  transformation.  This  transfor¬ 
mation  projects  three-dimensional  positions  onto  a  two-dimensional  plane  along  lines 
that  emanate  from  a  single  location,  or  the  center  of  the  projection. 

The  effects  of  homogeneous  coordinates  in  a  perspective  projection  transfor¬ 
mation  are  best  illustrated  with  an  example.  To  illustrate,  the  origin  of  the  cam- 
frame  is  designated  as  the  center  of  projection.  Figure  2.8  depicts  two  distinct 
points  in  three-dimensional  space,  with  the  Cartesian  coordinates  shown  in  the  image, 
('p  cam  _  ^x,y,x]'^  and  p  =  [2x,2y,2x]'^).  The  plane  the  points  are  projected  onto 
is  parallel  to  the  Y  and  X  axes  at  a  distance  of  one  unit.  Through  a  perspective 
projection,  both  points  project  to  the  same  position  on  this  two  dimensional  plane. 
Because  the  coordinates  of  these  points  all  share  the  same  ratio,  2:1,  the  points  repre¬ 
sent  two  of  an  inhnite  number  of  points,  along  the  same  line,  that  will  also  project  to 
the  same  point  on  this  plane.  The  scaling  term,  k  =  1,  included  into  the  homogenous 
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Figure  2.8:  Projecting  three-dimensional  points  onto  a  plane.  All  the  points  along  a 
line  emanating  from  the  center  of  the  projection  (the  origin  of  the  cam-frame)  project 
to  the  same  position  using  a  perspective  projection  transformation. 


coordinates  permits  the  scaling  needed  to  project  these  point  to  the  same  position  on 
the  plane  using  a  single  matrix.  The  correct  homogenous  coordinates  to  use  in  this 
projection  are  =  [x,y,x,V\^  and  =  [2a:,2|/,2x,l]'^- 

This  projection  of  points  is  shown  mathematically  for  the  example  points  in 
Figure  2.8.  Consider  the  transformation  matrix,  T; 
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(2.13) 


Projecting  these  points  onto  this  plane  with  the  use  of  the  transformation  has 
the  following  results: 
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where  the  *  represents  the  new  location  of  the  point. 
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These  two  points  can  be  normalized  by  the  value  of  their  k  scaling  term;  in  other 

words,  scaled  by  ^  and  =  2^,  respectively.  The  resulting  points  E  5“™" 

Iffc,  1*  S-k,2* 

=  £2*™  =  [f occupy  the  same  position,  located  on  the  plane. 

A  camera  can  be  represented  in  a  similar  manner.  A  camera  uses  a  lens  to 
capture  visible  points  along  lines  emanating  from  the  origin  of  the  cam-frame  onto  a 
two-dimensional  plane,  or  Ic.  This  information  is  a  two-dimensional  representation  of 
the  three-dimensional  world  where  this  camera  exists.  A  3x4  transformation  matrix, 
referred  to  as  the  camera  matrix  K,  represents  a  perspective  projection  transformation 
specihc  to  a  given  camera  and  lens.  Mathematically,  K  describes  the  mapping  of 
three-dimensional  positions  onto  two-dimensional  images  attained  through  the  lens 
[8].  Before  fully  introducing  the  camera  matrix,  the  next  section  presents  a  basic 
understanding  of  lenses. 

2. 1.3. 4  Pinhole  Camera  Model.  The  understanding  of  projective  ge¬ 
ometry  is  the  basis  for  the  projection  of  a  three-dimensional  world  onto  a  two  di¬ 
mensional  image  by  a  lens.  Understanding  the  geometries  involved  with  lenses  helps 
develop  precise  mathematical  relationships  between  an  object  and  an  image.  These 
mathematical  relationships  permit  real-world  determinations  of  positions  based  on 
information  contained  in  an  image.  A  pinhole  camera  model  approximates  the  rela¬ 
tionship  between  the  real  world  and  an  image  collected  by  an  ideal  pinhole  camera. 
Using  this  model,  the  necessary  relationships  are  developed  for  a  generic  camera. 

A  typical  lens  used  by  a  camera  (biconvex)  alters  parallel  light  incident  on  its 
surface  towards  a  point  that  is  a  hxed  distance  away,  known  as  the  focal  length  (/), 
behind  the  lens.  An  example  biconvex  lens  is  shown  in  Figure  2.9. 

The  lens  also  alters  non-parallel  light,  but  in  a  different  manner  that  is  better 
understood  by  examining  the  fundamental  equation  of  the  thin  lens  from  [12],  as 
acquired  from  [27]  and  shown  in  Figure  2.10. 
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Figure  2.9:  A  biconvex  lens.  Parallel  light  hitting  a  biconvex  lens  focuses  at  a  single 
point  that  is  a  hxed  distance  behind  the  lens.  This  distance  is  the  focal  length  of  the 
lens. 
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Figure  2.10:  Thin  lens  model.  The  fundamental  equation  of  the  thin  lens  explains  that 
light  incident  on  the  lens  from  a  point  source  (on  the  top  of  the  arrow)  a  distance 
{do)  in  front  of  the  lens  arrives  at  the  same  point  a  distance  (dj)  behind  the  lens. 
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From  the  thin  lens  theory,  light  that  is  incident  on  the  lens  and  not  parallel  does 
so  in  a  predictable  manner,  such  that  all  light  that  irradiates  from  a  point  source  (the 
tip  of  the  arrow  in  the  hgure)  arrives  at  the  same  point  a  certain  distance  behind  the 
lens. 

The  following  relationship  between  the  distances  in  Figure  2.10  can  be  shown: 
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do 
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(2.16) 


where  do  is  the  distance  from  the  object  in  the  scene  to  the  lens,  di  is  the  distance  from 
the  lens  to  the  image,  and  /  is  the  focal  length  of  the  lens.  Moving  the  object  farther 
away  {do  increases),  from  a  lens  with  a  hxed  focal  length  (/  constant),  the  distance 
to  the  image  {di)  decreases  and  its  relative  image  size  (or  the  image  translation  in  the 
Y  axis)  decreases. 

The  pinhole  model  reduces  the  size  of  the  lens  in  Figure  2.10  to  the  size  of  a  tip 
of  a  pin.  All  of  the  light  from  the  scene  that  is  incident  on  the  lens  passes  through  the 
optical  center  of  the  lens  and  is  projected  on  to  an  image  plane  located  a  focal-length, 
/,  behind  the  lens.  This  is  shown  in  Figure  2.11,  where  is  an  arbitrary  point  source 
at  the  tip  of  the  arrow. 
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Figure  2.11:  Pinhole  model.  Light  from  the  scene,  incident  on  the  lens,  is  projected 
to  an  image  a  distance  /  behind  the  lens. 
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The  physical  location  of  does  not  actually  move  to  the  image  plane.  Anno¬ 
tating  the  subscript  identifier  with  an  *  differentiates  this  visual  representation  of  a 
point  from  the  physical  location  of  a  point. 

From  the  pinhole  model  depiction  in  Figure  2.11,  the  two  triangles  created  by 
dotted  lines  on  either  side  of  the  lens  are  geometrically  similar.  The  angles  in  the 
triangles  are  the  same,  the  ratio  between  the  sides  are  the  same,  and  the  locations 
of  the  triangles’  vertices  are  related  by  the  negative  of  that  same  ratio,  or  —  This 
analogy  shows  that  the  translation  of  the  visual  presentation  along  the  Z  axis,  is 
equal  to  the  ratio  —  ^  times  the  translation  of  the  original  point  in  the  scene 
pTu  ~  ~z'  PTr'-  The  following  equation  expands  this  relationship  to  all  the  position 
coordinates  of  the  points  [27]  [12]: 

f 

cam  _  J  cam  (c\  -i 

Pi*  —  ^Pl 

Individually,  all  the  scalar  translations  of  are  scaled  by  the  same  ratio 
and  negated.  It  is  typical,  for  visual  simplicity  and  to  minimize  the  mathematical 
conversions  required,  to  place  the  image  plane  in  front  of  the  camera  frame  (such  that 
pT^*  —  +/)  [8])  shown  in  subsequent  hgures.  This  also  removes  the  need  for  the 
negative  sign  in  Equation  (2.17). 

The  physical  effects  of  a  lens  create  a  perspective  projection  transformation 
matrix,  detailed  in  Section  2. 1.3.3,  described  as  a  linear  mapping  of  homogenous 
points  in  [8].  The  following  equation  expands  the  relationship  of  Equation  (2.17) 
into  a  matrix  that  transforms  three-dimensional  homogenous  coordinates  to  the  two- 
dimensional  homogenous  coordinates: 

0  0  oi  r^' 

0  /,  0  0  ■  I  =T-pr”^  (2.18) 

0  0  1  oj  1_1_ 

where  /j,  and  fy  are  the  focal  lengths  in  the  axis  and  the  Ycam  axis  respectively. 
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The  normalization  of  the  resulting  two-coordinate  homogenous  point  (divide  the 
coordinates  by  the  k  value,  which  is  z)  in  Equation  (2.18),  is  the  same  as  the  division 
by  z  in  Equation  (2.17).  Expanding  the  two  axis  pinhole  model  to  show  all  three  axes 
shows  the  relationship  between  the  cam-frame  and  the  image-frame  in  Figure  2.12. 


Figure  2.12:  The  cam-frame  and  image-frame  relationship.  A  three-dimensional  pin¬ 
hole  camera  model  demonstrates  this  relationship. 


This  relationship  is  valid  for  the  pinhole  model;  however,  real-world  lenses  are 
not  ideal  pinhole  lenses.  Real  lenses  distort  the  image  [9].  Distortion  effects  of  a 
lens,  specifically  a  camera  lens,  are  examined  in  the  next  section.  In  addition,  the 
transformation  between  the  real-world  and  images  collected  by  a  camera  (Ic  images) 
are  expanded  to  include  the  conversion  to  the  image-frame. 

2. 1.3. 5  Camera  Model  and  Camera  Matrix.  Locating  an  object  in  an 
image  involves  three  characteristics  of  a  camera.  The  hrst  set  of  characteristics  is 
called  the  extrinsic  parameters  of  the  camera.  These  parameters  locate  the  object  in 
the  external  reference  frame  of  the  camera,  or  the  cam-frame.  The  second  character¬ 
istics  set  is  the  distortion  effects  from  a  non-pinhole  lens.  The  third  characteristics 
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set  is  called  the  intrinsic  parameters  of  the  camera.  These  parameters  locate  the  ob¬ 
ject  in  the  internal  reference  frame  of  the  camera,  or  the  image- frame.  This  section 
details  the  parameters  used  to  create  this  entire  transformation. 

The  external  relationship  between  the  camera  and  other  navigation  frames  uses 
the  same  rotations  and  translations  as  described  in  Section  2.1.3.  The  following 
equation  expands  the  relationship  of  Equation  (2.12),  but  not  the  specihc  values,  to 
account  for  homogenous  coordinates  [8]: 

(~^cam  _ ^cam^e 

^cam  ^  e  VCAM 

0lx3  1 

where  j)®  is  any  point  referenced  in  the  e-frame,  is  the  location  of  the  camera 

in  the  e-frame,  and  0ix3  is  a  row  vector  of  three  zeros.  This  matrix  is  the  same  as 
translating  and  then  rotating  the  location  of  the  point  into  a  new  reference  frame.  This 
transformation  is  the  extrinsic  relationship  of  the  camera  to  other  reference  frames. 
The  next  section  presents  the  distortion  effects  of  the  camera. 

A  simple  model  for  the  distortion  effects  of  the  lens  approximates  the  trans¬ 
formation  of  the  normalized  points  in  the  cam-frame  to  distorted  positions  in  the 
cam-frame.  A  calibration  process  for  a  specihc  camera  determines  these  distortion  ef¬ 
fects  [2].  Two  components,  radial  and  tangential,  comprise  the  total  distortion  model. 
The  hrst  component,  radial,  affects  both  the  x  translation  and  the  y  translation  in 
the  same  manner,  as  a  function  of  their  distance  from  the  "Lcam  axis  [9]: 

Xr  =  X  {l  +  Cir^  -t-  C2r"^  -|-  . . . )  (2.20) 

yr  =  y{l  +  cir^  -F  C2r'^  -h  . . . )  (2.21) 

with  the  temporary  substitutions:  x  for  p™™'  and  y  for  Xr  and  yr  are  the  po¬ 
sitions  of  the  point  in  the  cam-frame,  accounting  for  radial  distortion;  ci  and  C2  are 
arbitrary  coefficients  that  a  calibration  of  the  camera  and  the  lens  determines.  This 


(2.19) 
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distortion  model  estimate  of  the  actual  distortion  of  the  lens  depends  on  the  number 
of  coefficients  used.  For  this  thesis,  three  coefficients  were  sufficient. 

The  tangential  distortion  model  affects  the  x  translation  and  the  y  translation 
differently.  This  part  of  the  distortion  model  accounts  for  non-symmetric  distortion 
between  the  two  axes,  the  tangential  distortion  in  each  axis  is  dependent  on  both 
translations: 


5xt  =  2czxy  +  C4(r^  +  2x‘^)  +  . . . )  (2.22) 

5yt  =  2cixy  +  c^{r‘^  +  2y‘^)  +  . . . )  (2.23) 

where  5xt  and  5yt  are  the  change  in  position  of  the  point  in  the  cam-frame,  accounting 
for  tangential  distortion.  Common  practice  is  to  limit  these  coefficients  to  two. 

One  overall  distortion  model  combines  the  effects  of  the  two  distortion  models, 
with  the  addition  of  the  third  radial-distortion  coefficient,  C5  [9]: 

Xd  =  X  cir'^x  +  C2r'^x  -f  c^r^x  -|-  2c3xy  -|-  C4(r^  -|-  2x‘^)  (2.24) 

yd  =  y+  Cir^y  +  C2r^y  +  c^r^y  +  2ciXy  -F  C3(r^  -h  2y‘^)  (2.25) 

where  Xd  and  yd  are  the  normalized  positions  of  the  point,  accounting  for  distortion. 
Through  a  calibration  process,  the  determination  of  these  coefficients  creates  the 
distortion  model  for  a  particular  camera  and  lens  combination  at  a  hxed  focal  length. 
With  a  dehned  distortion  model  a  common  practice  is  to  un-distort  the  image.  An  un¬ 
distorted  image  approximates  the  image  a  pinhole  camera  would  have  collected.  This 
is  accomplished  by  moving  individual  pixels  in  an  through  the  reverse  of  the  above 
equations.  The  result  is  a  better  estimate  of  the  normalized  locations  of  the  points  in 
the  cam-frame.  Because  the  intrinsic  transformation  of  the  camera  (presented  next) 
is  a  linear  transformation,  this  un-distortion  process  can  be  applied  directly  to  the  Ic. 
For  the  remainder  of  this  thesis,  the  theory  presented  assumes  un-distorted  images. 


The  final  characteristics  of  a  camera,  the  intrinsic  parameters,  determine  the 
transformation  on  a  position  in  the  cam-frame  to  its  representation  on  the  Ic  as 
referenced  in  the  image-frame.  Eqnation  (2.18)  detailed  the  projection  from  the 
scene  to  the  image  plane,  with  both  points  referenced  in  the  cam-frame.  To  convert 
the  visnal  representation  into  the  image-frame,  an  additional  scaling  and  translation 
occur.  This  transformation  can  be  seen  in  Figure  2.13. 
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Figure  2.13:  Mapping  cam- frame  to  image-frame.  The  intrinsic  camera  parameters 
determine  the  projection  of  an  object,  located  physically  in  the  cam-frame  to  a  visual 
represented  location  in  the  image-frame. 


The  scaling  terms  account  for  the  difference  in  the  units  of  measurement.  Frames 
external  to  the  camera  denote  translations  in  a  standardized  measurement  unit  {feet, 
meter,  miles,  etc.),  the  image-frame  denotes  translations  in  pixels.  To  scale  the 
locations,  two  ratios  are  used,  one  for  each  the  X  and  Y  axis.  The  ratio  of  the 
width  in  pixels  over  the  width  in  standardized  measurements,  or  scales  the  X  axis 
translation,  similarly  ^  scales  the  Y  axis  translation.  Additionally,  the  term  skew. 
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or  s,  accounts  for  the  possibility  of  a  non-orthogonal  Ic,  defined  as: 


s  = 


K 

w 


M 

H 


cot  (6^5 


(2.26) 


The  translation  between  frames  accounts  for  the  location  of  the  principal  point 
in  the  image-iiame,  [xq, 

The  addition  of  the  scaling  and  translation  to  Equation  (2.18),  provides  the 
complete  transformation  from  cam-frame  to  image-iiam.e\ 
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The  term  represents  the  focal  length  of  the  lens  in  the  Xcam  axis,  in  pixels; 
it  is  replaced  by  a  single  symbol  a.  The  fy^  term  represents  the  focal  length  of  the 
lens  in  the  Ycam  axis,  in  pixels;  it  is  replaced  by  (3. 

The  camera  matrix  K  is  dehned: 


K 


a  s  Xo  0 
0  /3  po  0 
0  0  10 


(2.28) 


Combining  the  intrinsic  and  extrinsic  transformations  of  the  camera,  and  as¬ 
suming  the  removal  of  distortion  effects,  the  following  complete  transformation  relates 
real-world  locations  to  their  location  in  an  image: 
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(2.29) 


This  section  demonstrated  the  relationship  between  Cartesian  and  homogenous 
coordinates,  including  conversions,  rotations,  translations,  and  transformations  be¬ 
tween  frames.  This  information  is  the  basis  for  the  navigational  relationship  used 


30 


throughout  the  thesis.  The  next  section  expands  on  some  of  these  concepts  as  they 
relate  to  aircraft  formation. 

2.1.4  Formation  Coordinate  Frames.  Two  aircraft  known  as  the  lead  aircraft 
and  the  wing  aircraft  dehne  a  basic  formation.  Multi-element  and  multi-ship  can 
dehne  extended  formations  that  involve  any  number  of  additional  aircraft.  Navigation 
within  a  formation  is  always  a  complex  endeavor  that  involves  extensive  planning  and 
discussion  between  all  involved  operators  to  ensure  safe  operations. 

Formation  also  complicates  the  aircraft’s  navigation.  Position  and  orientation 
with  respect  to  an  Earth-referenced  navigation  frame  is  now  conpled  with  position 
and  orientation  with  respect  to  single  or  mnltiple  formation  aircraft  frames.  Location 
in  the  formation  dictates  when  reference  to  one  frame  will  have  priority  over  the 
other,  though  neither  is  used  completely  independent  of  the  other.  For  a  two-ship 
formation,  the  lead  aircraft  is  usually  the  only  additional  navigational  reference  point 
for  the  wing  aircraft. 

The  lead  aircraft  has  two  separate  coordinate  systems.  The  hrst  is  that  of  lead 
navigation  frame  or  n^-frame.  This  frame  is  independent  of  aircraft  orientation,  and 
was  fnrther  described  in  Section  2. 1.2. 3. 

The  second  coordinate  system  defined  with  respect  to  the  lead  aircraft  is  that  of 
lead  body  frame,  or  &j^-frame.  This  frame  is  rigidly  attached  to  lead,  and  was  further 
described  in  Section  2. 1.2.4. 

Other  aircraft  within  the  formation  dehne  their  location  based  on  one  of  the 
frames  of  the  lead  aircraft.  While  there  is  not  an  established  rule  on  which  frame  is 
preferred,  in  practice  it  is  dependent  on  the  distance  away  from  the  lead  aircraft.  The 
farther  an  aircraft  is  away,  the  harder  it  becomes  to  visually  determine  the  attitude 
of  the  lead  aircraft  and  appropriately,  the  nx-frame  is  used.  When  close  and  the  lead 
aircraft’s  attitude  is  readily  apparent,  navigation  is  typically  done  with  the  use  of 
the  fei-frame.  For  the  purposes  of  AAR,  the  aircraft  will  be  close,  and  normally  the 
fei-frame  is  used. 


31 


The  other  aircraft  in  the  formation  also  have  coordinate  frames  associated 
with  them.  The  wing  aircraft  have  both  a  6-frame,  or  6vy-frame,  and  a  n-frame, 
or  niy-frame.  They  are  both  similarly  dehned  as  the  coordinate  frames  of  the  lead 
aircraft. 

The  aspects  of  formation  navigation,  as  they  relate  to  AAR  will  be  further 
discussed  in  Chapter  3.  The  next  section  details  the  background  information  on  the 
programming,  including  the  open-source  libraries  used  in  the  research. 

2.2  Programming  Libraries 

A  large  portion  of  this  research  is  facilitated  by  the  programming  code  that 
ultimately  provides  the  navigational  solution.  The  programming  for  this  research 
used  the  C  programming  language. 

Within  the  C  language,  two  powerful  image  rendering  and  image  processing 
libraries  exist,  OpenGL  and  OpenCV.  Both  libraries  provide  critical  capabilities  to 
their  respective  areas,  and  ultimately  to  this  research.  OpenGL  is  a  graphics  rendering 
library  that  is  an  industry  standard  and  provides  fast,  accurate,  and  flexible  images. 
OpenCV  is  a  computer  vision  library  that  utilizes  quick  operations  on  matrices  and 
allows  real-time  analysis  of  images.  Many  terms,  especially  the  names  of  reference 
frames,  are  not  industry  standard  but  were  chosen  to  minimize  confusion  between 
the  different  disciplines  covered  in  this  research.  To  limit  the  depth  of  this  section, 
many  important  steps  and  processes  are  not  presented.  The  intent  of  this  section  is  to 
provide  a  general  understanding  of  these  libraries  for  readers  unfamiliar  with  them  and 
to  demonstrate  relationships  with  physical  cameras  and  images.  This  section  looks 
at  the  basics  of  both  libraries  and  their  interactions.  More  in-depth  explanations 
throughout  the  thesis  further  expand  on  the  libraries  interaction  with  the  research. 

2.2.1  OpenGL.  OpenGL  is  a  programming  library  that  interfaces  with 
a  platform’s  graphics  hardware.  The  “GL”  portion  of  the  library  name  stands  for 
Graphics  Library.  Because  of  its  platform-independence,  ease  of  use,  and  rendering 
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accuracy  and  speed,  OpenGL  is  used  throughout  the  computer  graphics  industry. 
Using  OpenGL  to  render  a  virtual  image  (I,.)  of  objects  involves  setting  up  a  scene 
with  objects  dehned  by  collections  of  small  polygons  and  applying  appropriate  lighting 
conditions  in  a  customizable  viewing- volume.  In  other  words,  it  involves  dehning  a 
volume  of  space  (called  a  frustum)  and  placing  objects  inside  the  frustum  to  see  them 
on  the  screen. 

The  main  assumption  behind  the  OpenGL  library  is  that  a  collection  of  poly¬ 
gons  can  approximate  any  object  in  a  scene.  As  the  number  of  polygons  increase, 
and  their  size  decrease,  the  smoothness  of  textures  and  surfaces  increase  towards  a 
representation  where  individual  polygons  are  not  recognizable  because  of  their  minute 
size.  Various  editing  programs  exist  to  create  the  objects  in  a  polygon  representation. 
Goloring  individual  polygons  to  match  the  actual  or  desired  texture  of  the  object  adds 
realism  to  the  scene. 

Orientation  in  the  OpenGL  world  requires  coordination  frames,  similar  to  those 
described  in  Section  2.1.2.  The  user  determines  the  location  of  the  OpenGL  world 
frame  (GL-frame),  anywhere  in  the  OpenGL  world,  in  whatever  units  are  required  by 
the  user.  It  is  simply  dehned  by  referencing  other  items  in  its  coordinate  frame. 

Likewise,  the  user  is  free  to  determine  other  frames.  A  camera  in  the  OpenGL 
world  can  exist  wherever  the  user  determines  and  typically  the  optical  center  of  the 
camera  is  co-located  with  the  origin  of  the  GL-frame,  but  this  is  not  required.  The 
camera  has  a  frame  associated  with  it:  the  GLcam- frame.  In  contrast  to  the  camera 
dehned  in  Section  2. 1.2.5,  the  GLcam-frame  has  the  negative  Z  axis  projecting  into 
the  viewing  area,  the  Y  axis  dehnes  the  vertical  axis  in  the  up  direction,  and  the 
X  axis  dehnes  the  horizontal  axis  to  the  right.  More  OpenGL-specihc  frames  will  be 
introduced  when  needed. 

Using  the  parameters  of  a  camera,  detailed  in  Section  2. 1.3. 5,  customizes  the 
Ir  produced  by  OpenGL  to  mimic  the  Ic  collected  by  the  camera.  By  modifying  the 
OpenGL  frustum  to  approximate  the  viewing  characteristics  of  a  camera  and  lens. 
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renderings  of  an  object  can  approximate  what  the  object  would  look  like  if  a  camera 
collected  the  images.  The  bulk  of  this  research  relies  on  the  correct  creation  of  an 
for  comparison  with  an  Ic  of  the  same  object.  The  following  sections  describe  a  basic 
OpenGL  setup. 

2. 2. 1.1  OpenGL  setup.  There  are  two  different  types  of  views  available 
in  OpenGL,  orthographic  and  projective.  An  orthographic  view  maintains  the  aspect 
of  parallel  lines.  For  example,  an  image  of  railroad  tracks  would  never  show  the  two 
rails  intersecting.  Orthographic  views  are  used  when  a  fixed  relative  scale  is  required, 
regardless  of  the  distance  from  an  object.  For  example,  engineering  diagrams  of  a 
building  would  use  an  orthographic  view  to  keep  the  floors  parallel  and  the  intersection 
of  floors  and  walls  perpendicular. 

A  projective  view  is  what  lenses,  including  the  human  eye,  capture.  A  projective 
view  does  not  maintain  parallel  lines,  instead  the  view  introduces  the  concept  of  a 
vanishing  point.  Those  same  railroad  tracks  eventually  intersect  at  a  theoretical 
distance  of  inhnity  in  a  projective  view.  Humans  understand  this  concept  naturally; 
as  an  example  train  conductors  do  not  slow  down  trains  because  the  tracks  appear  to 
narrow.  Although  both  views  are  available  in  OpenGL,  a  projective  view  produces 
images  that  represent  a  real  world  scene. 

Dehning  the  projective  view  in  OpenGL  requires  the  creation  of  a  projective 
frustum,  best  understood  visually  in  Figure  2.14.  The  dehnition  of  a  frustum  requires 
dehning  a  near  and  far  plane  {zNear  and  zFar),  a  Field  of  View  (FOV)  (typically  the 
FOV  in  the  Y  direction,  FOVy,  in  degrees  or  radians),  and  an  aspect  ratio  between 
FOVx  and  FOVy.  Alternatively,  through  trigonometric  relationships  of  the  truncated 
four-sided  pyramid,  dehning  left,  right,  top,  and  bottom  of  the  near  elipping  plane  can 
also  dehne  the  frustum.  Both  methods  are  similar;  however,  the  latter  allows  a  little 
more  customization  of  the  scene.  The  term  clipping  planes  describes  the  imaginary 
walls  of  the  frustum  because  they  clip  objects  from  the  eventual  I,,  of  the  scene. 
Figure  2.14  shows  these  terms  visually  with  the  orientation  of  the  GLcam-frame. 
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Figure  2.14:  OpenGL  viewing  volume  [21],  Using  the  respective  FOVs  or  the  dehni- 
tion  of  the  zNear  plane  dehnes  the  space  in  the  OpenGL  world  that  the  camera  can 
potentially  see. 

OpenGL  contains  a  few  different  functions  that  create  a  projective  frustum.  A 
simple  view  would  utilize  the  glFrustum( )  command  as  shown  in  Listing  2.1  [19,21].  It 
is  important  to  note  that  the  values,  zNear  and  zFar,  are  distances,  not  translations. 


Listing  2.1:  gIFrustum  (  )  function  declaration 

void  gl  F  r  ust  u  m  (  G  Ldou  ble  left  ,  GLdouble  right  ,  GLdouble  bottom, 
GLdouble  top,  GLdouble  zNear,  GLdouble  zFar); 


To  simulate  the  view  as  though  a  lens  of  a  camera  created  it,  the  following  solves 
for  the  respective  FOVs: 


h\  /  m 

FOVy  =  2  ■  tan“^  |  ^  j  =  2  ■  tan“^  f  ^ 

W\  /  N 

FOVx  =  2  ■  tan“^  |  ^  j  =  2  •  tan“^  f  — 


which  can  be  used  to  solve  for  the  parameters  in  Listing  2.1. 


(2.30) 

(2.31) 
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To  understand  how  an  object  in  the  frustum  appears  on  the  screen,  another 
important  frame  critical  to  the  use  of  OpenGL  is  introduced:  the  normalized  device 
coordinate  frame.  For  consistent  notation,  it  is  represented  in  this  thesis  as  the 
N DC-iia.me.  The  OpenGL  engine  uses  this  frame  to  determine  what,  where,  and 
how  to  place  objects  on  the  eventual  I,..  Located  in  this  frame  is  a  cube  termed 
the  canonical  viewing  volume  (GVV).  It  has  a  size  of  two  units  by  two  units  by 
two  units  with  the  N DC-iiame  origin  at  its  center.  This  frame  has  the  unfortunate 
characteristic  of  being  left  handed  with  the  Y  axis  projected  out  the  top  of  the  viewing 
volume,  the  X  axis  out  the  right,  and  the  Z  axis  into  the  frustum.  Every  part  of  a 
scene  transforms  into  this  frame  for  at  least  two  simple  determinations. 

As  an  over-simplihcation  of  this  process,  the  N DC-haiae  is  shown  co- located 
with  the  GLcam-frame  in  Figure  2.15.  Because  of  the  involved  transformations,  from 
the  cam-frame  to  iVUC'-frame,  the  conversion  is  not  as  simple  as  negating  the  Z  axis 
of  the  two  frames. 


Figure  2.15:  OpenGL  canonical  viewing  volume.  The  OpenGL  engine  uses  this  volume 
to  determine  what  objects  in  the  scene  can  be  placed  on  the  1^.  At  the  center  of  the 
cube  is  the  origin  of  the  N DC-ii&me. 
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OpenGL  first  uses  the  CVV  to  determine  what  objects,  or  parts  of  an  object, 
should  appear  on  the  screen.  If  the  transformed  position  falls  within  the  cube,  it  has 
the  potential  to  be  seen  on  the  screen.  Second,  OpenGL  determines  which  objects 
within  the  cube  obscure  other  objects.  If  two  objects,  or  more  precisely  if  two  pixels 
from  the  scene,  have  the  same  x  and  y  translation  in  this  frame  then  the  pixel  with 
the  lower  2:  translation  will  be  shown  (those  with  2  translations  less  than  negative  one 
were  already  discarded  in  the  first  step.) 

As  a  simplihed  overview  of  the  complete  OpenGL  rendering  process,  the  follow¬ 
ing  steps  occur.  First,  the  projective  GL  world  is  transformed  such  that  the  projective 
viewing  volume  shown  in  Figure  2.14  becomes  a  rectangular  shape  that  is  then  scaled 
to  the  shape  of  the  GVV  shown  in  Figure  2.15.  Second,  collapsing  the  contents  of 
the  cube  onto  the  rear  wall  (located  at  z  =  -1  in  the  NDC-irame)  creates  a  two  unit 
by  two  unit  image  of  the  scene.  Third,  scaling  that  image  to  the  1^  size  required  by 
the  user  (nominally,  the  on-screen  window  size)  completes  the  process.  Figure  2.16 
details  the  entire  transformation  that  is  presented  in  the  following  sections.  The  hrst 
two  transformations  Ti  and  T2  from  the  hgure  are  modihed  slightly  in  Ghapter  5. 

2. 2. 1.2  GLcam-frame  to  CVV.  The  transformation  from  GLcam- 
frame  to  the  NDG-irame  and  the  projective  frustum  to  the  GVV  is  a  two-step  process. 
The  first  step  transforms  the  lines  emanating  from  the  GLcam-frame  origin  to  lines 
that  are  parallel  (Ti)  and  then  scales  the  perspective  to  the  scale  of  the  GVV  (T2). 
For  simplicity,  an  intermediate  coordinate  frame  is  not  dehned  for  the  intermediate 
step  between  GLcam-frame  to  NDG-frame]  instead  both  are  referenced  in  the  NDG- 
frame  with  an  understanding  that  a  secondary  transformation  occurs. 

This  first  step  does  not  create  an  orthographic  view  of  the  scene;  rather  it  allows 
the  projective  view  to  be  scaled  such  that  it  becomes  rectangular  in  shape.  Since  the 
computer  does  not  have  a  lens  to  create  this  view,  it  does  it  digitally  by  increasing 
the  scale  of  objects  closer  to  the  zNear  clipping  frame  while  decreasing  the  scale  of 
those  closer  to  the  zFar  clipping  frame. 
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Pyramid 


Cube 


Figure  2.16:  Overview  OpenGL  process.  The  OpenGL  process  starts  with  a  defined 
viewing  frustum  and  transforms  it  into  a  2x2x2  cube,  the  contents  of  the  cube  are 
placed  on  a  2x2  image  that  is  expanded  to  the  necessary  size. 


As  an  example,  compare  the  relative  size  of  the  two  arrows  in  Figures  2.17  and 
2.18.  In  the  projective  view,  they  are  both  located  close  to  the  zNear  and  zFar  clipping 
planes,  but  within  the  defined  projective  frustum.  Additionally,  both  have  the  same 
height  (visually,  the  length  in  the  y  translation).  Because  Arrow  1  is  closer  to  the 
camera  frame  it  spans  a  larger  angle  of  the  frustum’s  FOVy.  In  contrast  Arrow  2  is 
farther  away  and  spans  a  smaller  angle.  The  first  transformation  converts  all  the  lines 
emanating  from  the  center  of  projection  (the  GLcam-frame  origin)  into  parallel  lines. 

This  transformation  appropriately  scales  the  entire  scene  in  the  GLcam-iiaxae 
such  that  the  four  sided,  truncated  pyramid-shaped  frustum  becomes  rectangular  in 
shape.  A  transformation  matrix  (Ti)  represents  the  first  transformation: 


Ti 


zNear  0  0  0 

0  zNear  0  0 

0  0  zNear  -|-  zFar  zNear  *  zFar 

0  0-1  0 


(2.32) 
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Figure  2.17:  Two  arrow  example,  projective  frustum.  In  this  example,  the  two  arrows 
referenced  in  the  GLcam-frame  appear  to  have  the  same  height. 


Figure  2.18:  Two  arrow  example,  iVUC'-frame.  The  example  scene  from  Figure  2.17 
is  transformed,  such  that  the  lines  emanating  from  the  origin  of  the  GLcam-bame 
become  parallel.  Arrow  1  is  now  scaled  larger  than  Arrow  2. 
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This  first  transformation  does  not  acconnt  for  scaling  (the  projective  view  frns- 
tnm  scale  to  the  cnbe  scale).  This  is  illnstrated  in  Fignre  2.18  as  the  arrows  do  not 
reside  within  the  CVV.  The  second  step  (T2)  scales  the  transformation  accomplished 
in  the  hrst  step  appropriately;  in  other  words  the  rectangnlar-shape  frnstnm  becomes 
cube-shaped.  For  completeness,  there  is  a  final  step  that  normalizes  the  homogenous 
coordinates.  In  the  second  step,  shown  in  Figure  2.19,  the  two  arrows  both  fall  within 
the  CVV  and  both  have  the  potential  to  be  seen  on  the  I,..  However,  in  this  example 
arrow  1  would  most  likely  cover  up  arrow  2  and  arrow  2  would  not  be  seen  on  the  1^. 


Figure  2.19:  Two  arrow  example,  CVV.  The  second  step  in  the  transformation  process 
scales  the  scene  in  the  VUC-frame,  such  that  the  projective  viewing  volume  dehned 
by  gl Frustum  (  )  is  transformed  to  the  CVV. 

The  second  transformation  is  represented  as  a  transformation  matrix  (T2); 


2 

0 

0 

right-\-left 

right— left 

right— left 

0 

2 

0 

top-\-bottom 

top— bottom 

top— bottom 

0 

0 

2 

zFar-\-zNear 

zNear—zFar 

zNear—zFar 

0 

0 

0 

1 

(2.33) 
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This  transformation  (T2)  is  the  same  transformation  required  if  the  original 
scene  was  an  orthographic  view  and  not  projective.  It  can  be  created  with  a  call  to 
the  OpenGL  library  function  glOrtho(  ),  shown  in  Listing  2.2  [22]. 


Listing  2.2:  glOrtho(  )  function  declaration 

void  gl  0  rt  h  o  (  G  Ldou  ble  left  ,  GLdouble  right  ,  GLdouble  bottom, 
GLdouble  top,  GLdouble  zNear  ,  GLdouble  zFar); 


OpenGL  also  creates  a  similar  transformation  with  the  call  to  glFrustum(  ).  The 
transformation  created  by  glFrustum(  )  is  the  same  as  the  combination  T2-Ti  (placed 
in  this  order  because  the  original  points  are  pre-multiplied  by  the  transformations). 

The  transformation  created  by  glFrustum(  )  is  shown  in  Equation  (2.34).  This 
transformation  accomplishes  the  two  steps  of  the  GLcam-irame  to  GVV  transforma¬ 
tion. 


gIFrustum 


2*zNear 
right— left 

0 

0 


0 

2*zNear 
top— bottom 

0 


0  0 


right+left 
right— left 
top+bottom 
top— bottom 

zFar+zNear 

zNear—zFar 


0 

0 

2*zFar*zNear 

zNear—zFar 


-1  0 


=  T2Ti 


(2.34) 


This  one  step  transformation  is  seen  in  Figure  2.20,  with  the  inclusion  of  the 
lines  emanating  from  the  center  of  projection  in  each  viewing  volume. 

For  an  example  of  this  transformation,  consider  a  projective  viewing-volume 
that  is  symmetric  both  horizontally  and  vertically  (equal  viewing  area  on  both  sides 
of  the  Z  axis).  The  function  call  terms  can  be  replaced  with  width  (W)  and  height 
(H)  values,  such  that  left  =  right  =  y,  bottom  =  —  y,  and  top  =  y,  causing  the 
hrst  two  values  in  the  third  column  in  Equation  (2.34)  to  be  zero.  With  that  setup, 
consider  a  point  in  the  GLcam-frame  with  the  following  translations: 


GLcam 

H  1 


w 

2 


H 

2 


—zNear 


(2.35) 
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Figure  2.20:  The  glFrustum(  )  transformation.  The  function  creates  a  transformation 
matrix  that  maps  the  eight  corners  of  the  projective  viewing  volume  into  the  shape 
of  a  cube. 

This  point  is  located  in  the  projective- view  frustum  on  the  near  clipping  plane  in 
the  top  left  corner,  shown  as  a  circle  on  the  pyramid  in  Figure  2.20.  Pre-multiplying 
this  point  by  the  transformation  created  by  glFrustum( ),  the  intermediate  point  results 
in: 


^NOC 


2*zNear 

w 

0 

0 

0 

1 

1 

—zNear 

0 

2*zNear 

H 

0 

0 

H 

2 

zNear 

0 

0 

zFar-\-zNear 

zNear—zFar 

2*zFar*zNear 

zNear—zFar 

—zNear 

—zNear 

0 

0 

-1 

0 

1 

zNear 

(2.36) 


which  normalizes  to: 


^NDC 


(2.37) 


This  position  is  located  on  the  top,  left,  rear  corner  of  the  eight  unit^  CVV  cube 
and  would  be  seen  on  the  image.  Similarly,  the  other  seven  coordinates  that  make 
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up  the  pyramid  would  transform  to  the  other  seven  corners  of  the  cube.  This  section 
covered  the  first  step  the  next  section  details  the  second  step  of  the  OpenGL  process. 

2. 2. 1.3  CVV  to  GLimage- frame.  The  hnal  transformation  of  an  object 
to  an  \r  is  now  presented.  Locations  on  an  are  referenced  in  the  GLimage- frame. 
This  frame  is  located  with  the  origin  at  the  bottom-left  corner  of  the  resulting  1^.  The 
X  axis  projects  out  the  right  side  of  the  image,  parallel  to  the  X  axis  of  the  GLcam- 
frame  and  the  Y  axis  projects  out  the  top  of  the  image,  parallel  to  the  Y  axis  of  the 
GLcam-frame. 

To  demonstrate  the  transformation  hrst  requires  the  GLimage' -frame  as  an 
intermediary  frame.  This  frame  is  similarly  dehned  as  the  NDC-frame  but  is  located 
at  the  bottom,  left,  rear  corner  of  the  CVV  cube,  or  one  unit  below,  one  unit  to 
the  left,  and  one  unit  behind  the  NDC-frame  origin  as  shown  in  Figure  2.21.  The 
transformation  of  a  point  located  in  the  NDC-frame  to  the  same  point  referenced  in 
the  CLimage' -frame  requires  adding  one  unit  to  all  three  coordinates. 

Similar  to  the  cam-frame  to  image-frame  transformation  (Section  2. 1.3. 5),  the 
projection  of  the  NDC-frame  on  to  the  I,,  is  shown  in  Figure  2.21  to  visualize  the 
conversion  process.  To  make  the  transformation  into  the  CLimage- frame,  the  image 
is  scaled  to  the  size  of  the  user-dehned  window,  typically  not  accomplished  using  the 
OpenGL  library  (however,  various  utility  wrappers  exists,  such  as  OpenGL  utility 
toolbox  (GLUT)  that  will  create  these  windows).  As  shown  in  Figure  2.22  the  scaling 
for  the  X  axis  uses  the  ratio  of  the  desired  width  in  pixels,  N,  over  the  current 
width,  two,  or  (y),  with  similar  scaling  used  for  the  Y  axis  (^).  This  is  the  default 
transformation  for  OpenGL,  while  additional  customization  can  be  used  to  move  the 
origin  anywhere  in  the  image  and  to  dehne  a  new  width  and  height  for  a  smaller 
viewing  window. 
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GLcam 


Figure  2.21:  The  GLimage' -iia.m.e.  The  frame  is  located  at  the  bottom,  left,  rear 
corner  of  the  CVV  cube.  The  projection  of  the  N DC-iiaxae  is  shown  projected  on  to 
the  image. 


< - > 


Figure  2.22:  The  GLimage'-iiam.e  to  GLimage-iisuve.  The  image  in  the  GLimage'- 
frame  is  stretched  to  the  user’s  need  to  create  the  1^. 
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The  mathematical  transformation  from  CVV  to  GLimage-iiame  is  shown: 


^GLimage  (2.38) 

^GLimage  (2.39) 

The  total  transformation  from  an  object  in  the  GL  world  to  an  image  combines 
the  two  transformation,  GLcam-irame  to  CVV  with  CVV  to  GLimage-irame. 

To  render  a  scene  in  OpenCL,  the  typical  process  starts  with  the  origin  of 
the  model  located  at  the  origin  of  the  GLcam-hame,  shown  in  Figure  2.14  as  the 
intersection  of  the  three  axes.  The  process  then  translates  and  rotates  the  model  into 
the  frustum,  at  which  point  the  conversion  process  presented  in  this  section  occurs. 
If  portions  of  the  model,  in  its  hnal  position  prior  to  rendering,  fall  outside  of  the 
frustum,  the  process  clips  it  from  the  resulting  rendered  image,  I^,  detailed  further  in 
the  next  section. 

2. 2. 1.4  OpenGL  Image.  An  advantage  of  OpenCL  rendering  is  that  it 
is  efficient.  The  OpenGL  rendering  engine  creates  and  displays  an  quickly,  typically 
without  extensive  modihcation,  and  then  discards  it  (clearing  memory  space  for  the 
next  image).  To  efficiently  operate,  OpenGL  references  an  as  a  1-dimensional 
array  [1].  An  array  for  a  simple  1^  with  a  height  of  six  pixels  and  a  width  of  nine 
pixels  is  shown  in  Figure  2.23. 

As  a  drawback  to  OpenGL  accessing  images  in  this  manner,  the  ability  to  access 
a  single  pixel,  or  the  color  value  of  a  single  pixel,  is  not  as  efficient  as  it  is  with  an 
OpenCV  image  (shown  in  the  next  section).  To  access  a  pixel  in  the  OpenGL  image 
structure  requires  some  knowledge  of  the  image,  additional  programming  to  access 
it,  and  again  to  modify  it.  A  simplistic  example  is  increasing  the  value  of  the  green 
component  for  every  pixel  in  the  image  represented  in  Figure  2.23.  The  value  in  the 
second  array  storage  location  (G  in  pixel  one)  and  the  value  in  every  third  array 
storage  location  thereafter  would  be  accessed  directly,  modihed,  and  then  the  image 
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Figure  2.23:  OpenGL  image  storage  configuration  [1],  OpenGL  accesses  pixel  infor¬ 
mation  from  an  image  using  a  single  array  with  a  length  equal  to  the  total  number  of 
image  pixels  x  three. 

could  be  displayed.  Additionally,  the  rendering  process  must  be  complete  and  the 
image  located  on  the  visual  producing  hardware  before  access  to  pixels  is  available. 

Of  note,  the  bottom  left  of  an  image  (the  origin  of  the  G Lima ge- frame)  is  the 
location  of  the  hrst  pixel  in  an  OpenGL  image,  contrary  to  the  location  of  the  hrst 
pixel  in  an  OpenGV  image,  which  is  in  the  top  left. 

This  completes  the  introduction  to  OpenGL,  more  specihcs  of  the  library  are 
presented  throughout  the  thesis.  The  next  section  presents  the  details  of  the  OpenGV 
library,  followed  by  the  interaction  between  the  two  libraries. 

2.2.2  OpenCV.  OpenGV  is  a  programming  library  that  provides  tools  for 
analyzing  images;  typically  an  Ic  acquired  from  an  optical-type  sensor,  such  as  a 
camera,  infrared  sensor,  radar,  etc.  The  “GV”  portion  of  the  library’s  name  stands 
for  Gomputer  Vision.  The  tools  the  library  provides  extract  quality  information  from 
the  images. 

The  basis  behind  the  OpenGV’s  analysis  is  creating  access  to  an  image  as  a 
matrix  with  a  width  and  height  equal  to  the  number  of  pixels.  For  color,  the  matrix 
has  an  additional  dimension  with  additional  layers  for  color  components  as  shown  in 
Figure  2.24.  Each  value  in  the  matrix  corresponds  to  the  color  or  grayscale  intensity 
of  the  corresponding  pixel.  By  realizing  images  in  this  way,  quick,  effective,  and 
accurate  analysis  is  possible.  As  an  example,  to  increase  the  valne  of  the  green 
component  in  every  pixel  in  an  image,  the  entire  memory  for  the  green  component 
(a  single  level  of  the  matrix)  can  be  accessed  at  one  time  and  modihed  collectively, 
a  more  efficient  method  than  OpenGL  would  implement.  The  language  also  allows 
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Figure  2.24:  OpenCV  image  storage  configuration  [1],  OpenCV  accesses  pixel  infor¬ 
mation  from  an  image  using  a  single  or  multiple  level  matrix. 

for  efficient  modification  and  analysis  of  the  images  in  other  ways.  By  modifying  the 
values  of  the  pixels,  images  can  be  sharpened,  smoothed,  darken,  lightened,  and  more. 

Storage  of  an  image  in  the  OpenCV  library  is  done  with  the  use  of  a  data  struc¬ 
ture.  The  structure  is  referenced  as  I  pi  I  mage,  originally  dehned  as  part  of  Intel’s  Image 
Processing  Library  (the  Ipl  in  Ipllmage)  [3].  This  strncture  is  shown  in  Listing  2.3. 

The  actual  memory  storage  locations  of  the  pixels  of  an  image  are  contained 
in  the  imageData  portion  of  the  Ipllmage  (a  pointer  to  the  hrst  pixel  of  image  data). 
The  other  variables  of  the  structure  provide  useful  information  about  the  image,  most 
of  which  are  intuitive.  Values  in  some  of  the  variables  determine  the  most  efficient 
memory  allocation  and  storage.  A  black  and  white  image  uses  fewer  layers  of  the 
storage  matrix  and  generally  less  memory  than  a  color  image. 

The  images  accessed  by  OpenCV  are  referenced  in  the  two-axis  CV image- frame. 
The  origin  of  the  frame  is  in  the  top  left  corner  of  the  image,  with  the  X  axis  denoting 
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Listing  2.3:  OpenCV  image  structure 

typedef  struct  _lpllmage 


{ 

int 

-  ■  r . 

n  S  ize  ; 

int 

ID  ; 

int 

nChannels  ; 

int 

alphaChannel  ; 

int 

depth ; 

char 

colorModel  [4]  ; 

char 

channelSeq  [4]; 

int 

dataOrder  ; 

int 

origin  ; 

int 

align  ; 

int 

width  ; 

int 

height  ; 

struct 

_lplROI  *roi; 

struct 

_lpllmage  *maskROI  ; 

void  * 

i  m  a  ge  1  d  ; 

struct 

_lplTilelnfo  *tilelnfo 

int 

i  m ageS  ize  ; 

char* 

imageData  ; 

int 

widthStep ; 

int 

BorderMode  [ 4 ]  ; 

int 

BorderConst  [4]  ; 

char* 

image DataOrigin  ; 

} 

1 p 1 1  m a ge  ; 

horizontal  pixel  location  to  the  right  and  the  Y  axis  denoting  vertical  pixel  location 
below.  The  first  pixel  in  the  top  left  corner  is  at  position  [0,  0]  in  the  image. 

The  benehts  of  the  OpenCV  image  structure  are  evident  in  the  speed  at  which 
the  library  processes  images.  Some  of  the  useful  functions  used  by  this  research  are 
presented  in  the  next  portions  of  this  section. 

2.2.2. 1  OpenCV  Find  Contour.  As  the  name  of  the  function  implies, 
cvFindContours(  )  accomplishes  just  that.  A  collection  of  points  found  in  an  image 
that  somehow  appear  connected,  such  as  a  line  or  curve,  dehne  a  contour.  Often 
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an  outline  of  an  object  or  identifiable  marks  on  the  object  define  these  curves.  This 
function  searches  through  an  image  and  returns  a  collection  of  contours. 

As  an  example,  the  code  snippet  in  Listing  2.4  demonstrates  the  use  of  cvFind- 
Contours(  ).  The  snippet  takes  an  Ipllmage  (dehned  as  image),  thresholds  the  image 
(effectively  reduces  the  effects  of  strong  illumination  and  reflection),  hnds  the  con¬ 
tours,  and  then  draws  the  contours  on  a  new  Ipllmage  (contourJmage)  which  can  be 
displayed  as  needed.  The  results  of  this  function  call  can  be  seen  in  Figure  2.25. 

2. 2. 2. 2  OpenCV  Match  Template.  Another  important  function  that 
OpenCV  provides  is  a  template  matching  function.  This  function  accepts  two  I  pi  Im¬ 
ages  regardless  of  size.  If  the  images  are  different  sizes,  the  smaller  Ipllmage  is  the 
template,  and  the  larger  Ipllmage  is  what  the  template  is  matched  against.  If  the 
images  are  the  same  size  they  are  simply  matched  against  each  other.  The  symbol  It 
represents  the  template  image,  I™  represents  the  larger  image  to  match  against.  Using 
a  matching  algorithm  (detailed  next),  the  function  systematically  compares  It  with 
every  possible  portion  of  I^-  The  function  returns  to  the  user  a  matrix  of  values  that 
result  from  the  matching.  The  dimensions  of  the  returned  matrix  are  equal  to  the 
difference  in  dimensions  between  the  two  original  I  pi  I  mages  plus  one.  If  the  two  images 
were  the  same  size,  the  returned  matrix  is  a  single  value.  As  an  example,  the  two 
images  in  Figure  2.26  are  exactly  100  pixels  different  in  size  (scaled  to  £t  on  the  page). 
The  resulting  101x101  matrix  would  return  10,201  values  for  every  possible  location 
of  It  in  !„.  The  symbol  R  represents  the  resulting  matrix,  and  R(b  j)  represents  the 
value  in  the  matrix  at  the  ith-row  and  the  jth-column.  A  subscript  following  the  R 
denotes  the  method  used  to  compute  it  (such  as  Rcorr,  subscript  notation  matches  the 
methods  presented  in  [3].) 

Typically,  information  gathered  from  cvMatchTemplate(  )  is  the  location  in  the 
Im  where  the  It  most  likely  matches  and  the  result  at  that  location  of  the  cho¬ 
sen  matching  function.  The  returned  locations  of  the  match  are  referenced  in  the 
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Listing  2.4:  OpenCV  cvFindContours(  )  function  pseudocode 

//  An  OpenCV  Ipllmage  already  exists  named  image 
Ipllmage  con  to  u  r  _i  m  a  ge  (width  x  height)  //  Create  an  OpenCV  image 
Memory  g.storage  (width  x  height  x  3)  //  Create  contour  storage 

//  Threshold  the  image  and  place  the  results  in  contour_image 
cvAdaptiveThreshold  (image  ,  contour_image) 

//  Find  contours  and  place  in  storage 
cvF  i  n  d  Co  n  to  u  rs  (  CO  n  to  u  r_i  m  a  ge  ,  g_storage) 

//  Clear  the  created  image 
cvZero(contour_image) 

//  Draw  the  contours  on  contour_image 
cvD  ra  wContou  rs  (  CO  n  to  u  r_i  m  a  ge  ,  g_storage) 


Figure  2.25:  OpenCV  cvFindContours(  )  function.  This  function  finds  the  contours 
of  an  image  (left  side),  and  in  this  application  of  it,  places  the  contours  in  another 
image  (right  side). 
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Figure  2.26:  OpenCV  cv  Match  Tern  pi  ate  function.  This  function  matches  the  template 
on  the  right  side  I*  with  the  image  on  the  left 

CVimage-frame.  The  symbol  r  denotes  a  two  by  one  position  vector  of  the  location 
of  the  most  likely  match. 

There  are  six  different  matching  functions  available  in  cvMatchTemplate(  )  [3]; 
however,  three  of  them  are  normalized  versions  of  the  other  three.  These  normalized 
versions  help  rednce  the  effects  of  lighting  and  shading  on  the  images.  Two  of  the  six 
were  used  in  this  research,  both  of  which  were  normalized  and  the  details  of  those  two 
are  presented.  The  following  nomenclatnre  is  nsed  in  this  section:  Wt  and  Ht  are  the 
width  and  height  of  the  template,  Wm  and  Hm  are  the  width  and  height  of  the  image 
to  match  against,  x  and  y  represent  the  translations  of  specihc  pixels  in  R,  and  x' ,  y' 
and  x" ,  y"  represent  the  translations  of  specihc  pixels  in  (referenced  twice  in  one 
eqnation).  All  the  translations  are  referenced  in  the  CVimage-h:am.e. 

The  hrst  matching  method  is  the  correlation  matching  method  [3] .  This  method 
determines  a  match  by  multiplying  the  valnes  in  the  images  together  and  then  sqnaring 
them.  This  is  similar  to  a  snm  sqnared  difference,  with  a  multiplication  instead  of  a 
difference.  A  perfect  match  will  be  large,  and  bad  matches  will  be  small  or  zero: 

y)  =  '^  (lt{x',  y')  ■  lm{x  +  x\  y  +  y')f  (2.40) 

x',  y' 
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The  second  method  is  the  correlation  coefficient  matching  method  [3].  This 
method  matches  the  \t  relative  to  its  mean  against  the  Im  relative  to  its  mean.  A 
perfect  match  would  be  one  and  a  perfect  mismatch  would  be  negative  one.  The 
process  determines  the  mean  value  of  I*  and  and  subtracts  the  respective  mean 
from  the  pixel  values  of  each  image.  This  creates  new  images  denoted  as  and 
detail  in  the  following  equations: 


I,(x',  v')  =  y')  - 


(W,  ■  H,)  Y,  y") 


(2.41) 


l'^(x  +  x',  y  +  y')  =  Iffix  +  x',  y  +  y') 


{Wm-Hm)  Y,  lra{x  +  x'\  y  +  y'^ 


X",  y" 


(2.42) 


These  intermediate  images  are  then  multiplied,  squared,  and  summed  in  the 
same  manner  as  the  first  method  to  provide  a  value  between  negative  one  and  positive 
one  for  each  location  in  the  R  matrix; 

Rcoeffix,  y)  =  Y  {^t{x\  y')  ■  I'ffix  +  x',  y  +  y’)^  (2.43) 

x' ,  y' 


The  normalized  versions  of  these  matching  methods  divide  the  resulting  matrix 
by  a  normalized  coefficient.  The  resulting  Rs  are  shown: 


^corrjnormedis^  1  V) 


^coeffjnormediS^i  V) 


Rcorr(^:  V) 


^  y')f  •  ^  {l^{x  +  0:',  y  +  y')y 

,y'  x' ,  y' 

Rcoe//(^7  1/) 


(2.44) 


(2.45) 


Figure  2.27  shows  an  example  of  the  correlation  coefficient  matching  method 
applied  to  the  two  sample  images  shown  in  Figure  2.26. 


52 


Figure  2.27:  OpenCV  cvMatchTemplate(  )  result.  The  result  of  the 

cvMatchTemplate(  )  applied  to  the  images  in  Figure  2.26  is  a  matrix  of  dimensions 
equal  to  the  difference  in  dimensions  between  the  and  plus  one.  The  negated 
values  of  the  matrix,  from  the  correlation  coefficient  matching  method,  are  shown 
here  as  height  in  the  plot. 

The  values  of  Rcoeff.normedix,  y)  are  plotted  as  negative  heights,  the  images  are 
closer  to  exact  opposites  (values  close  to  negative  one).  From  both  Figures  (2.26  and 
2.27),  it  appears  that  the  best  match  (or  best  mismatch)  of  the  two  images  is  near 
the  middle  of  Im-  The  next  section  introduces  the  interaction  between  the  libraries. 


2.2.3  OpenGL  to  OpenCV.  The  benefits  of  both  libraries  are  evident  as  they 
were  designed  to  be  efficient  at  what  they  do  best.  Fortunately,  both  can  accomplish 
portions  of  the  other’s  capabilities  when  needed,  but  not  as  efficiently  as  the  other.  To 
utilize  the  rendering  power  of  OpenGL  in  combination  with  the  image  manipulation 
and  comparison  power  of  OpenCV  it  is  necessary  to  pass  information  between  them. 
In  this  research,  the  interaction  between  the  two  libraries  was  limited  to  passing 
images.  It  was  required  that  images  rendered  by  OpenGL  be  compared  with  images 
collected  from  the  camera.  To  make  the  comparison,  the  OpenGL  images  had  to  be 
converted  to  OpenCV  images.  Because  of  the  different  storage  methods  of  the  two 
libraries,  and  the  lack  of  available  applications  requiring  both  OpenGL  and  OpenCV, 
an  open-source  conversion  process  was  not  available. 
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According  to  [1],  a  possible  solution  requires  accessing  the  color  value  of  each 
pixel  in  the  OpeuGL  image  and  arranging  it  in  the  proper  sequence  of  a  blank  OpenCV 
image  of  the  same  size,  as  shown  in  Listing  2.5.  In  this  example,  an  OpenGL  image  is 
rendered  to  the  screen  (not  shown  in  the  listing),  and  an  OpenCV  image  (CVimage) 
is  created  and  memory  allocated  with  the  same  memory  size  as  the  OpenGL  image. 
The  OpenGL  image  is  then  read  into  the  memory  location  with  the  glReadPixels(  ) 
function.  The  for  loop  cycles  through  all  the  pixels  of  the  OpenGL  image,  now 
stored  in  the  memory  location  GLimage,  and  copies  them  to  the  correct  location  in 
CVimage.  The  memory  location  GLimage(O)  would  access  the  R  component  of  the 
hrst  pixel.  Similarly,  the  CVimage— >imageData  (0,0,0)  variable  would  point  to  the 
memory  location  of  the  first  pixel  of  the  initially  blank  OpenCV  image. 

The  pixel  transfer  does  not  account  for  the  difference  in  first  pixel  location 
between  an  OpenGL  and  OpenCV  image,  a  flip  of  the  image  is  accomplished  with 
cvFlip(  ).  The  other  variables  in  the  listing  are  used  to  systematically  move  through 

Listing  2.5:  OpenGL  to  OpenCV  image  conversion  pseudocode  [1] 

Ipllmage  CVimage  (width  x  height)  //  Create  an  OpenCV  image 

Memory  GLimage  (width  x  height  x  3)  //  Create  memory  storage 

g  I  Re  a  d  P  i  xe  I  s  (  widt  h  ,  height  ,  GLimage)  //  Screen  image  into  storage 

windex,  hind  ex  =  0 

//  windex  is  the  width  index,  hindex  is  the  height  index 

for  i  from  0  to  width  *  height  *3  by  3 
//  Cycle  through  all  the  pixels  in  the  image 

IF  windex  >=  width 
windex  =  0 
hindex  =  hindex  +  1 

//  Place  the  pixels  of  GLimage  into  CVimage 
CVimage— >imageData  (  h  I  ndex  ,  windex,  0)  =  GLimage(i+2)  //  B 

CVimage— >imageData  (  h  I  ndex  ,  windex,  1)  =  GLimage(i+l)  //  G 

CVimage— >imageData  (  h  I  ndex  ,  windex,  2)  =  GLimage(i+0)  //  R 

windex  =  windex  +  1 
end 

c  V  F  1  i  p  (  CVimage  ) ;  //  Account  for  different  origins 
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all  the  pixels  in  both  images. 

This  is  the  initial  method  of  image  conversion  between  programming  libraries 
used  in  this  research.  This  concludes  the  background  section  on  programming,  some 
of  the  areas  are  revisited  in  Chapter  5.  To  finalize  this  chapter,  a  introduction  to 
Kalman  hltering  is  presented. 

2.3  Kalman  Filtering 

Kalman  hltering  is  a  statistically  based  method  to  update,  propagate  and  es¬ 
timate  the  state  (mean  and  uncertainty)  information  for  a  system  with  noise.  By 
assuming  a  statistical  knowledge  of  the  noise  and  a  model  of  the  system  process,  the 
hlter  estimates  state  information  that  tends  to  be  closer  to  true  values  than  a  system 
without  a  Kalman  hlter.  The  hlter  also  assumes  some  knowledge  of  the  available  mea¬ 
surements,  how  they  relate  to  the  process,  and  their  accuracy.  The  research  presented 
in  this  thesis  made  use  of  a  linear  Kalman  hlter.  As  an  introduction,  the  following 
terms  are  dehned: 

•  X  — )■  state  vector  (nxl  vector) 

•  X  — )■  derivative  of  the  state  vector  (nxl  vector) 

•  F  — )■  homogeneous,  continuous-time,  system-dynamics  matrix  (nxn  matrix) 

•  $  — discrete-time  state  transition  matrix  (nxn  matrix) 

•  B  — >  input  matrix  [nxb  matrix) 

•  u  ^  input  (6x1  vector) 

•  G  — )■  noise  transformation  matrix  (nxg  matrix) 

•  w  — )■  white  noise  processes  (gxl  vector) 

•  Q  — )■  process  covariance  {qxq  matrix) 

•  z  — >  measurement  vector  (mxl  vector) 

•  H  — >  observation  matrix  {mxn  matrix) 
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•  V  — )■  white  noise  processes  (mxl  vector) 

•  R  —7-  measurement  covariance  {mxm  matrix) 

•  P  — >  state  covariance  {nxn  matrix) 

•  K  — )■  Kalman  gain  [nxm  matrix) 

In  the  introduction  of  terms,  n  is  the  number  of  states  to  be  tracked,  m  is  the 
number  of  measurements  available,  b  is  the  number  of  inputs  into  the  system,  and  q  is 
the  number  of  noise  sources  in  the  system  model.  A  subscript  k  after  the  above  values 
denotes  a  discretized  version  of  the  term.  The  noise  processes  are  white,  Gaussian, 
zero- mean  noise  (WGN)  processes.  A  white  noise  source  has  constant  power  across 
all  frequencies,  the  WGN  is  a  random  process  with  a  mean  of  zero  and  standard 
deviation  from  zero  of  a. 

In  a  Kalman  hlter,  the  states  are  represented  as  random  variables  characterized 
by  a  Gaussian  distribution.  Gaussian  distributions  allow  the  hlter  to  characterize 
the  state  with  only  two  values,  their  mean  (x)  and  their  covariance  (P).  The  state 
mean  is  the  hlter’s  estimate  of  the  true  value  of  the  state,  while  the  covariance  is 
representative  of  the  uncertainty  in  that  estimate. 

Without  any  noise  in  the  system,  and  no  inputs,  the  basic  relationship  between 
X  and  X  is  the  F  matrix  as  expressed  in  the  difference  equation  x  =  Fx.  Additionally, 
if  deterministic  external  inhuences  exist  in  the  form  of  inputs  into  the  system,  they 
relate  to  x  by  the  B  matrix,  or  x  =  Fx  +  Bu.  This  relationship  assumes  zero  noise 
in  the  system;  therefore,  this  relationship  does  not  introduce  uncertainty.  With  a 
determined  initial  condition,  this  process  model  would  know  the  state  at  any  instant 
in  time  without  any  uncertainty. 

Since  most  systems  have  noise  of  some  intensity,  the  Kalman  hlter  characterizes 
the  noise  in  the  process  as  a  WGN  process  w(t)  with  covariance  Q(t),  dehned  as: 

E{w{t)w^{t  +  r)}  =  Q(t)(5(r)  (2.46) 
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where  S{t)  is  the  Dirac  delta  function,  and  E{-}  is  the  expectation  operator.  This 
represents  zero-correlation  in  time  of  the  noise  source;  the  noise  value  at  any  instant 
is  not  dependent  on  the  noise  value  at  any  other  time. 

The  basic  Kalman  hlter  denotes  the  relationship  between  these  terms  in  a 
continuous-time,  stochastic  differential  equation: 

x(t)  =  Fx(t)  -I-  Bu(t)  -I-  Gw(t)  (2.47) 

This  equation  models  the  system  dynamics  in  continuous  time.  The  WGN  processes 
in  the  model  do  not  actually  introduce  noise  into  the  states,  rather  they  characterize 
the  noise  that  is  already  present  as  a  function  of  the  process. 

It  is  more  common,  and  necessary  when  implementing  in  digital  computers,  to 
represent  a  model  in  discrete-time  for  implementation  in  a  Kalman  hlter,  such  that 
Equation  (2.47)  is  represented  as: 


Xfc  =  <hfc_iXfc_i  -F  Bfc_iUfc_i  -F  Wfc_i  (2.48) 

where  k  is  an  instant  in  time,  k  —  1  is  one  instant  before  k,  and  ‘hfc-i  is  the  state 

transition  matrix  for  time  k  —  1.  The  state  transition  matrix  is  a  function  of  F  and 

the  sampling  interval  (At)  of  the  process,  such  that: 

$(At)  =  (2.49) 

Implementing  this  system  into  a  Kalman  hlter  is  possible  withont  measurement 
updates.  The  hlter  tracks  the  mean  and  covariance  of  the  state,  x  and  P.  The  Kalman 
hlter  makes  the  following  predictions  at  every  sampling  interval: 

Xfc  =  <hfc_ix^_i  -F  Bfc_iUfc_i  (2.50) 

Pfc  =  -|-  Qfc_i  (2-51) 
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where  and  indicate  the  estimation  and  nncertainty  immediately  before  time 
k  {a  priori),  while  Xfc_i  and  Pfc_i  indicates  the  estimation  and  nncertainty  immedi¬ 
ately  after  the  previous  time  k  —  1  (only  referenced  as  an  a  posteriori  estimate  if  a 
measurement  update  was  incorporated.)  This  is  the  propagation  step  of  the  hlter,  as 
time  progresses,  the  uncertainty  in  the  state  estimation  increases  (P  increases). 

Decreasing  the  uncertainty  in  the  state  estimation  requires  the  inclusion  of  mea¬ 
surements  into  the  Kalman  hlter.  The  discrete  measurement  process  is  modeled: 

Zfc  =  HfcXfc  -F  Vfc  (2.52) 


where  v  is  a  WGN  process  with  covariance  R(t): 

E{v(t)v'^(t  -|-  r)}  =  R(t)(5(r)  (2.53) 

The  key  to  the  Kalman  hlter  updating  x  and  P  with  measurement  information 
is  determining  the  Kalman  gain  at  the  current  time: 

K,  =  P^Hj(H,PtHj  +  Ri)-'  (2.54) 

The  Kalman  gain  is  based  on  the  current  covariance  of  the  system,  P,  and  the 
covariance  of  the  measurement,  R.  Based  on  those  values  the  Kalman  hlter  updates 
the  estimate  of  x  and  its  uncertainty,  P,  with  the  measurement  update 

^  Xfe  +  Kfc  izk  -  (2.55) 

PJ  =  (I  -  KiH,)P,-  (2.56) 

where  I  is  an  identity  matrix  of  size  n  x  n.  This  is  the  update  step  of  the  hlter. 

These  are  the  basic  relationships  of  the  linear  Kalman  hlter.  The  process  con¬ 
tinually  repeats  as  needed:  propagate  then  update.  Every  system  is  unique,  some 
include  measurements  at  every  sampling  time,  others  only  have  access  to  measure- 


ments  periodically  and  continually  propagate  until  a  measurement  is  available.  This 
general  set  of  equations  can  be  tailored  to  many  different  situations  and  adapted  to 
more  advanced  filters  (the  extended  Kalman  filter  and  unscented  Kalman  filter  are 
both  based  on  these  basic  equations). 

To  implement  the  Kalman  filter,  an  analysis  of  the  system  is  needed  to  gain 
some  knowledge  of  the  noise  in  the  system  process  and  the  measurements.  This,  in 
addition  to  a  process  and  measurement  model  of  the  system,  permits  the  Kalman 
filter  to  estimate  the  state  information  better  than  not  using  a  filter  at  all. 

This  chapter  has  presented  the  background  needed  for  this  thesis.  The  follow¬ 
ing  chapter  adapts  the  information  presented  here  to  the  AAR  problem  outlined  in 
Chapter  1. 
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III.  The  Nature  of  Air  Refueling 


The  general  nature  of  air  refueling  is  dynamic  and  the  interactions  between  air¬ 
craft  are  non-deterministic.  The  dynamics  of  flight,  compounded  by  aerody¬ 
namic  influences  between  aircraft,  unpredictable  atmospheric,  and  environment  con¬ 
ditions,  variations  in  lighting,  and  partially  predictable  human  responses,  all  make  AR 
a  stochastic  process.  In  AR,  the  human  operator  innately  understands  the  balance 
between  the  possible  and  the  probable.  He  applies  this  knowledge  to  render  accurate 
and  timely  decisions  that  impart  motions  to  the  aircraft  based  on  their  estimations 
of  current  conditions. 

The  best  way  to  portray  this  information  to  an  autonomous  system  is  with 
models.  By  modeling  the  dynamics  of  a  tanker  aircraft,  the  autonomous  solution  can 
apply  some  of  the  same  innate  knowledge  in  an  attempt  to  match  the  effectiveness  of 
its  human  counterpart. 

This  chapter  has  two  sections:  the  hrst  details  some  key  AR  assumptions  applied 
to  the  development  of  a  process  model  and  the  second  presents  the  AR  positions  and 
the  information  needed  to  determine  them  for  autonomous  operations  and  render  them 
as  images  in  OpenGL.  This  chapter  assumes  two  aircraft  in  formation,  a  lead  aircraft 
and  a  wing  aircraft,  and  no  knowledge  of  the  lead  aircraft’s  position  or  attitude. 

3.1  Air  Refueling  Dynamics 

In  a  very  basic  sense,  the  number  of  degrees  of  freedom  (DOF)  of  two  individual 
aircraft  is  twelve.  Each  aircraft  has  six  DOF,  three  translations  from  a  coordinate 
system  origin,  and  three  attitudes  with  respect  to  that  coordinate  system.  By  placing 
the  reference  coordinate  system  on  one  of  the  aircraft,  three  DOF  are  eliminated 
because  the  aircraft  with  the  coordinate  system  has  a  translation  of  zero  in  all  three 
axes.  Using  the  &-frame  as  the  coordinate  system,  the  attitude  information  is  zero 
in  all  three  axes  as  well.  Using  the  n-frame  as  the  coordinate  system,  the  attitude 
information  is  not  zero,  but  the  system  has  access  to  the  attitude  of  the  aircraft  with 
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an  INS.  Using  either  frame  reduces  the  total  unknown  DOFs  to  six:  the  attitude  of 
the  lead  aircraft  and  the  translation  of  the  lead  aircraft  from  the  wing  aircraft. 

The  first  portion  of  this  section  presents  the  theoretical  and  empirical  motions 
of  a  simulated  tanker  aircraft  in  these  six  DOF  and  how  they  relate  to  AR.  The  second 
portion  builds  on  this  knowledge  to  create  a  model  of  the  lead  aircraft’s  dynamics. 


3.1.1  Air  Refueling  Assumptions.  Of  these  six  DOF,  a  few  have  less  vari¬ 
ation  during  air  refueling.  The  first,  and  most  identifiable,  is  the  lead  aircraft’s  yaw 
attitude,  or  heading.  A  typical  refueling  consists  of  straight  tracks,  with  very  small 
heading  changes  and  turning  tracks  with  constant  roll-angle  turns  and  predictable 
heading  changes.  During  the  straight  tracks,  verihcation  of  the  lead  aircraft’s  head¬ 
ing  is  not  required  very  often. 

During  turns,  verihcation  of  the  lead  aircraft’s  heading  is  accomplished  more 
often.  With  a  good  estimate  of  the  roll  attitude  of  the  lead  aircraft,  a  change  in 
heading  (or  yaw  rate,  ip)  can  be  predicted  [29].  An  aircraft’s  turn  rate,  often  related 
to  standard  rate  {3° / second)  or  half  standard  rate  {1.5° /second),  is  dependent  on  the 
bank  angle  and  true  airspeed  of  the  aircraft,  Vt'- 


i) 


Vt 


gtancj) 

Vt 


^rad^ 


y^ec) 


(3.1) 


where  g  is  the  acceleration  due  to  gravity.  This  relationship  assumes  coordinated 
hight  and  can  be  used  to  update  the  estimated  heading  of  the  lead  aircraft,  at  At 
sampling  times,  through  the  following  relationship; 


ipk  =  fJk-i  +  Alp  (3.2) 

where,  Aip  is  the  amount  of  change  between  sampling  times,  such  that  Aip  =  ip  ■  At. 
With  this  equation  as  a  predictor  of  heading,  only  a  slightly  higher  verihcation  rate 
of  heading  is  required  during  turning  tracks,  as  compared  to  straight  tracks. 
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Empirical  truth  data,  representative  of  tanker  maneuvers,  verify  the  yaw  motion 
of  a  lead  aircraft  during  AR  and  the  estimation  of  the  heading  from  Equation  (3.2). 
During  the  data  collection  for  this  thesis  the  lead  aircraft  flew  these  maneuvers  up  to 
an  approximately  30°  roll  angle.  Data  were  collected  at  lOOHz,  and  evaluated  at  0.1 
second  intervals  (At).  The  data  collected  was  over  a  250-second  time  segment  during 
which  operationally  representative  refueling  maneuvers  were  performed  by  the  lead 
and  wing  aircraft.  Further  details  on  data  collection  are  presented  in  Chapter  5. 

The  data  is  shown  in  Figure  3.1  and  demonstrates  two  things:  the  benign  head¬ 
ing  changes  of  the  aircraft  and  the  accuracy  of  the  heading  predictions.  The  aircraft’s 
heading  with  respect  to  the  n-frame  is  shown  as  a  continuous  line,  referenced  to  the 
left  axis,  and  the  maximum  change  in  heading  between  At  is  shown  in  small  circles, 
referenced  to  the  right  axis.  Additionally,  the  estimated  heading  is  shown  as  a  dotted 
line  also  referenced  to  the  left  axis. 


Tanker  Heading  (Flight  2,  Record  12) 


Figure  3.1:  Lead  aircraft  -0  (solid  line),  A'lp  (small  circles),  and  estimated  heading 
(dotted  line).  Heading,  change  in  heading,  and  estimated  heading  during  a  represen¬ 
tative  prohle  of  a  tanker  aircraft  during  a  rendezvous  maneuver,  including  an  initial 
roll  angle  to  roll-out.  The  max  change  values  are  presented  as  absolute  values. 


The  change  in  heading  is  shown  as  absolute  values.  The  time  period  of  interest 
includes  a  33°  roll  angle  turning  track  in  the  hrst  25  seconds  leading  to  a  roll-out  on 
heading,  followed  by  a  straight  track.  The  maneuvers  were  hand  flown  by  experienced 
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pilots  and  the  actual  heading  of  a  tanker  aircraft  will  vary  depending  on  the  use  of 
an  autopilot  and  pilot  skill.  Based  on  10  years  of  AR  experience,  it  is  the  author’s 
opinion  that  these  maneuvers  represented  actual  refueling  maneuvers. 

Data  analysis  of  this  hgure  determined  that  the  change  in  heading  is  approxi¬ 
mately  0.02°  per  At  {0.2° / second)  during  the  straight  track.  During  the  turning  track 
the  change  in  heading  is  approximately  0.15°  per  At  {1.5° /second),  or  half  standard 
rate.  These  values  will  change  with  differing  bank  angles  and  true  airspeeds  flown. 
Additionally,  the  estimation  of  heading  initially  does  a  good  estimation  over  short 
durations.  Later  in  the  data  run,  without  updates,  the  estimation  drifts  away  from 
the  actual  true  heading. 

In  conclusion.  Figure  3.1  demonstrates  that  'ijj  verihcation  is  not  required  often, 
especially  during  a  straight  track,  and  Equation  (3.2)  predicts  as  a  function  of 
(j)  even  during  turning  tracks.  If  verihcations  were  accomplished  every  two  to  three 
seconds,  the  lead  aircraft’s  heading  change  would  be  less  then  0.5°  during  that  time. 

Another  low-variation  DOF,  is  the  lead  aircraft’s  pitch  attitude.  Tanker  aircraft 
attempt  to  maintain  a  constant  altitude  during  AR.  A  constant  altitude  requires  small 
periodic  motion  in  the  pitch  attitude  causing  a  direct  influence  on  the  velocity  in  the 
down  axis  in  the  n-frame.  In  Figure  3.2  the  lead  aircraft’s  pitch  attitude,  with  respect 
to  the  n-frame,  during  the  same  time  period  in  flight  as  Figure  3.1,  is  shown  with 
similar  markings. 

The  tanker  pitch  range  during  this  entire  time  period  is  2°  and  the  largest 
change  in  pitch  was  0.04°  per  At  {0.4° /second).  Even  during  the  turn  accomplished 
during  the  hrst  25  seconds,  the  pitch  did  not  change  dramatically.  As  an  example, 
two  aircraft,  one  with  7.5°  and  one  with  5.5°  pitch  with  respect  to  the  n-frame,  as 
seen  from  the  cam-frame  of  a  wing  aircraft,  are  shown  in  Figure  3.3.  Typically,  a 
human  operator  is  not  able  to  recognize  this  subtle  a  difference  over  the  time  spans 
of  interest.  As  a  result  of  these  AR  assumptions  and  the  empirical  data,  this  thesis 


63 


Tanker  Pitch  (Flight  2,  Record  12) 


Figure  3.2:  Lead  aircraft  6  (solid  line)  and  A6  (small  circles).  Pitch  and  change  in 
pitch  during  a  representative  prohle  of  a  tanker  aircraft  during  a  rendezvous  maneuver, 
including  an  initial  roll  angle  to  roll-out.  The  max  change  values  are  presented  as 
absolute  values. 


Figure  3.3:  Lead  aircraft  at  6  =  7.5°  and  9  =  5.5°.  The  visual  difference  in  an  aircraft’s 
pitch  as  viewed  from  a  camera  below,  looking  up  at  approximately  30°.  The  image 
on  the  left  is  at  6^  =  7.5°,  the  right  is  at  0  =  5.5°,  both  with  respect  to  the  n-frame. 
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Max  change  In  pitch  angle  over  0.1  seconds  (degrees) 


assumes  that  6  is  constant  and  V’  is  a  function  of  0,  and  thereby  verihes  these  DOFs 
less  frequently  than  the  other  DOFs. 

The  same  run  from  the  previous  hgures  is  shown  depicting  the  down  velocity  of 
the  lead  aircraft  with  respect  to  the  n-frame  in  Figure  3.4,  as  a  direct  result  of  the 
pitch  inputs  by  the  pilot  (witnessed  in  the  similarity  between  the  peaks  and  values 
of  the  two  Figures  3.2  and  3.4).  The  lead  aircraft  was  generally  within  10  feet  per 
second  (fps)  of  holding  constant  altitude  (or  zero  fps).  Additionally,  the  change  in 
down  velocity  was  typically  less  than  0.2  fps  per  Ah  The  relationship  between  down 
velocity  and  9  is  non-linear,  but  over  short  periods  a  linear  approximation  can  be 
made.  The  algorithm  presented  in  this  thesis  does  not  leverage  this  relationship; 
however,  by  accurately  predicting  one  of  the  two  DOFs  an  estimate  of  the  other  is 
possible,  in  a  similar  manner  to  the  0  and  0  relationship  shown  earlier. 

Of  the  two  remaining  translation  DOFs,  north  and  east  velocities,  the  same 
data  is  shown  for  east  velocity  only  in  Figure  3.5.  Both  are  similar,  and  the  difference 
between  actual  values  depends  only  on  current  heading,  0.  These  two  DOFs  have 


Tanker  Down  Velocity  (Flight  2,  Record  12) 


Figure  3.4:  Lead  aircraft  V down  (solid  line)  and  A  Vdown  (small  circles).  Down  velocity 
and  change  in  down  velocity  during  a  representative  prohle  of  a  tanker  aircraft.  The 
max  change  values  are  presented  as  absolute  values. 
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Tanker  East  Velocity  (Flight  2,  Record  12) 


Figure  3.5:  Lead  aircraft  Veast  (solid  line)  and  AVeast  (small  circles).  East  velocity 
and  change  in  east  velocity  during  a  representative  profile  of  a  tanker  aircraft.  The 
max  change  values  are  presented  as  absolute  values. 

larger  variation  than  the  down  velocity,  but  together  they  are  constrained  by  the  fairly 
constant  true  airspeed  of  the  lead  aircraft. 

Finally,  the  most  dynamic  DOF  during  AR  is  roll,  shown  in  Figure  3.6.  The 
majority  of  the  time  this  attitude  stays  constant;  however,  large  changes  periodically 
occur  that  must  be  verihed  more  frequently.  Based  on  these  lead  aircraft  assumptions 
and  empirical  data,  the  next  section  presents  a  dynamic  model  of  the  process. 


Tanker  Roll  (Flight  2,  Record  12) 


Figure  3.6:  Lead  aircraft  (j)  (solid  line)  and  A0  (small  circles).  Roll  and  change  in 
roll  during  a  representative  prohle  of  a  tanker  aircraft.  The  max  change  values  are 
presented  as  absolute  values. 
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3.1.2  Air  Refueling  Model.  The  Kalman  filter  presented  in  Section  2.3 
requires  an  estimated  model  of  a  system’s  dynamics.  Analyzing  the  motion  of  the 
aircraft  determines  an  approximate  model  for  the  hlter.  This  section  analyzes  the 
empirical  data  of  a  lead  aircraft  with  respect  to  the  n-frame  and  determines  that 
model.  Many  of  the  coordinate  frames  presented  in  Chapter  2  are  justihable  for  this 
analysis.  The  n-frame  was  chosen  for  its  proximity  to  the  aircraft  and  independence 
from  the  motion  of  the  wing  aircraft.  The  created  model  is  based  on  the  dynamics  of 
the  lead  aircraft  in  the  n-frame  and  can  be  tailored  to  the  other  frames,  based  on  the 
knowledge  of  the  wing  aircraft’s  motion. 

Analysis  of  the  data  in  the  previous  hgures  along  with  other  similar  data  runs 
(15  total),  provides  the  data  needed  for  creation  of  a  model  of  the  lead  aircraft  in  the 
n-frame.  Analysis  of  the  magnitude  and  frequency  characteristics  of  the  data  leads 
to  a  model  that  approximates  the  motion,  or  potential  motion,  of  the  aircraft. 

The  plots  in  Figure  3.7  are  a  culmination  of  15  empirical  runs.  Figure  3.7a  shows 
the  velocities  in  the  representative  axis.  These  are  the  same  plots  as  the  previous 
hgures  of  NED  velocity,  without  the  change  per  At  shown.  Figure  3.7b  is  a  similar 
plot  of  the  accelerations  of  the  lead  aircraft  in  the  n-frame.  The  acceleration  data  is 
more  noisy  then  the  velocity.  The  analysis  of  this  data  is  shown  in  Table  3.1. 

The  north  and  east  velocities  show  large  variation  about  mean  values  that  are 
not  similar  to  each  other.  Down  velocity  is  centered  near  zero  with  a  low  standard 
deviation.  All  the  accelerations  have  a  near-zero  mean.  Since  the  Kalman  hlter 
requires  the  modeling  of  the  noise  in  the  system  to  be  WGN,  the  accelerations  and 
potentially  the  down  velocity  can  be  modeled  as  noise  sources  for  the  Kalman  hlter. 
However,  the  accelerations  for  north  and  east  do  not  actually  appear  to  be  white 
noise  sources,  they  appear  to  have  a  random,  but  constrained  motion  to  them,  not  a 
characteristic  of  a  white  noise  source.  To  help  further  characterize  this  motion,  the 
data  from  the  runs  were  placed  in  a  Power  Spectral  Density  (PSD)  plot  shown  in 
Figures  3.8  (a)  and  3.8  (b). 


67 


Flight  2,  15  Runs,  NED  Accelerations 


Flight  2,  15  Runs,  NED  Velocities 


(a)  (b) 

Figure  3.7:  Velocity  and  acceleration  motion  of  a  lead  aircraft. 

(a)  The  aircraft’s  NED  velocities  for  15  runs;  data  analysis  shown  in  Table  3.1. 

(b)  The  aircraft’s  NED  accelerations  for  15  runs;  data  analysis  shown  in  Table  3.1. 


NED 

Direction 

Mean  velocity 
(fps) 

STD  velocity 
(fps) 

Mean  acceleration 
(fps^) 

STD  acceleration 
(fps^) 

North 

-36.0 

186.0 

0.3 

5.0 

East 

-6.0 

183.0 

-0.3 

4.0 

Down 

-0.1 

4.0 

0.0 

0.7 

Table  3.1:  Analysis  of  empirical  data.  STD  is  the  standard  deviation  of  the  value. 
Absolute  values  greater  than  one  were  rounded  to  the  nearest  integer. 
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Down  Magnitude  (db)  East  Magnitude  (db)  North  Magnitude  (db) 


Flight  2,  15  Runs,  NED  Veiocities  PSD  Flight  2,  15  Runs,  NED  Accelerations  PSD 


(a)  (b) 

Figure  3.8:  PSD  of  the  lead- aircraft’s  velocity  and  acceleration  data.  A  PSD  of  the 
culmination  of  data  runs  details  the  magnitude  and  frequency  of  the  aircraft’s  motion. 


69 


The  velocity  PSDs  are  close  to  a  straight  line  (similar  to  a  PSD  of  integration). 
The  acceleration  PSDs  show  a  break  point  common  to  all  three  axes  around  30  rad/s. 
This  information,  coupled  with  the  data  from  Table  3.1,  dehnes  a  possible  model  for 
the  tanker.  A  hrst-order  Gauss-Markov  (FOGM)  is  a  process  that  can  represent  the 
accelerations  as  a  noise  source  with  two  values:  a  time  constant  and  a  variance  [4]. 
For  a  generic  acceleration,  a,  an  example  FOGM  process  is  shown: 


a  =  — 


1 

T 


a  +  w{t) 


(3.3) 


where  T  is  the  time  constant,  and  w{t)  is  a  WGN  such  that: 


Yj{w{t)'uF{t  +  r)}  =  Q{t)5{T) 


(3.4) 


where  Q  is  the  variance  of  the  noise.  Modeling  a  system  state  as  a  FOGM  process 
represents  a  time-related  statistical  limit  to  the  variability  of  the  state.  The  FOGM  in¬ 
troduces  the  concept  of  a  time-correlated  random  walk  best  illustrated  with  a  counter 
example  shown  in  Figure  3.7b.  The  down  acceleration  is  a  process  that  does  not  im¬ 
mediately  appear  to  have  limited  variability  in  its  values.  The  down  acceleration  has 
indiscriminately-sized  random  motions  over  short  periods  of  time.  In  fact,  the  down 
acceleration  could  be  modeled  directly  as  a  WGN  source,  with  a  covariance  equal  to 
the  standard-deviation,  determined  from  the  empirical-data,  squared: 


pT  =  wit) 

(3.5) 

E{w{t)w'^{t  +  r)}  =  (T^(5(r) 

(3.6) 

where  p”  is  the  change  in  down  acceleration  of  lead  in  the  n-frame.  This  is  the 
equivalent  of  zero  correlation  in  time  between  any  two  values  of  p”.  Modeling  a  state 
as  a  noise  source  in  this  manner  portrays  to  the  Kalman  hlter  that  the  value  of  the 
state  will  most  likely  (statistically)  be  anywhere  between  ia  at  any  given  instance 
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in  time  and  the  value  has  no  dependance  on  time;  in  other  words  the  value  at  one 
instance  in  time  is  not  dependent  on  the  value  at  any  other  instance  in  time. 

In  a  FOGM  process,  a  value  in  time  is  dependent  on  other  values  in  time. 
This  is  demonstrated  in  Figure  3.7  (b).  Both  the  north  and  east  accelerations  have 
a  time-correlated  random  walk.  The  accelerations  do  not  jump  to  the  extremes  of 
the  hgure,  instead  they  are  limited  to  smaller  motions  near  their  previous  values.  As 
seen  in  the  hgure,  over  time,  they  can  randomly  walk  to  the  extremes  of  the  hgure. 
With  a  time  constant  incorporated  into  the  model  of  the  system,  the  hlter  assumes  a 
statistically-based  limitation  to  the  change  in  value  with  respect  to  time. 

Finally,  since  the  PSD  of  the  velocities  are  similar  to  an  integration,  the  veloci¬ 
ties  are  modeled  as  an  integration  of  acceleration  and  position  as  a  double  integration 
of  acceleration: 


pI  =  Px 
Px  =  Px 

=  +  «>(«) 


- 1 

_ 1 

0 

1 
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rx 

= 

0 
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0 

1 

T_ 

1 

1 

(3.7) 

(3.8) 

(3.9) 

(3.10) 


where  P2  is  the  north  change  in  position  of  lead  in  the  n-frame.  Of  note,  the  tc(t) 
noise  is  still  a  WON  process;  it  is  shaped  into  a  FOGM  process  with  the  use  of  the 
time  constant. 

The  down  acceleration  was  shown  as  a  process  that  might  be  better  modeled  as 
a  WGN  source;  however,  it  has  a  corner  frequency  similar  to  the  other  accelerations. 
Additionally,  climbs  and  descents  were  not  accomplished  on  these  runs,  those  types 
of  motions  would  be  better  modeled  with  a  FOGM  process  as  well. 
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The  acceleration  PSD  plots  in  Figure  3.8  had  a  similar  corner  frequency  of 
approximately  30  rad/s.  The  time  constant  of  the  FOGM  is  the  inverse  of  the  corner 
frequency,  T  =  1/30.  Since  there  is  a  non-unity  gain  shown  in  the  PSD,  the  following 
relationship  is  used  for  the  covariance  of  the  noise: 

Q=^  (3.11) 

where  a  is  the  standard  deviation  determined  in  Table  3.1  [4]. 

As  verihcation,  analysis  of  30  Monte  Carlo  runs  of  this  dynamic  process  model 
are  shown  as  a  PSD  of  the  FOGM  in  Figure  3.9.  Of  these  30  runs,  the  mean  accel¬ 
eration  was  zero  and  the  standard  deviation  was  hve,  matching  the  values  from  the 
empirical  data.  The  FOGM  has  more  gain  than  the  empirical  data  in  the  one  to  ten 
rad/s  range.  This  extra  noise,  portrays  to  the  hlter  a  dynamic  process  less  accurate 
than  the  actual  process.  The  uncertainty  in  the  state  estimate  will  grow  faster  than 
the  actual  accuracy  of  the  process,  possibly  weighting  the  measurements  more  than 
they  should  during  measurement  updates.  However,  this  also  portrays  to  the  hlter 
that  larger  deviations  from  the  state  estimate  are  reasonable,  allowing  motions,  larger 
than  a,  to  be  accepted  with  less  uncertainty.  This  extra  noise  should  not  be  a  concern 
for  this  process;  a  higher-order  model  of  the  aircraft  motion  would  possibly  match  the 
system  dynamics  better. 

Assuming  the  only  difference  between  north  and  east  velocities  is  current  head¬ 
ing,  this  FOGM  model  will  be  adequate  for  those  two  DOFs.  The  down  component 
shares  the  same  time  constant  with  a  lower  a.  Finally,  a  similar  analysis  was  con¬ 
ducted  for  roll.  Unfortunately  only  the  roll  attitude  was  collected  empirically.  Based 
upon  the  empirical  data  the  mean  roll  angle  was  3.5°  with  a  standard  deviation  of 
6.0°.  Inferred  from  empirical  data  was  a  mean  roll  rate  of  0.0°/sec  with  a  standard 
deviation  of  0.5°/sec.  As  a  result,  the  roll  rate  was  modeled  as  a  FOGM  as  well,  but 
a  time  constant  for  the  roll  was  not  determined  from  the  empirical  data. 
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Monte  Carlo  FOGM,  PSD 
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Figure  3.9:  PSD  of  30  Monte  Carlo  runs.  A  model  of  the  dynamics  of  the  tanker 
approximates  the  possible  motions  of  the  aircraft.  The  mean  acceleration  of  all  the 
runs  was  zero,  with  a  standard  deviation  of  five,  matching  the  empirical  data. 
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The  following  continuous-time  process  model  represents  the  lead  aircraft  in  the 


n-frame: 
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where  Om  is  a  MxM  matrix  of  zeros,  OmxN  is  a  MxN  matrix  of  zeros,  1m  is  a  MxM 
identity  matrix,  is  the  standard  deviation  for  north  and  east,  is  the  standard 
deviation  for  down,  and  and  will  be  determined  through  the  tuning  of  the 
filter.  To  apply  the  Kalman  hlter  equations,  the  matrices  are  converted  to  discrete 
time  matrices,  to  resemble  Equation  (2.48).  This  continuous-time,  dynamic-process 
model  is  valid  for  tracking  a  lead  aircraft  performing  tanker-type  maneuvers  in  the 
n-frame.  Accounting  for  the  dynamics  of  another  aircraft  updates  this  model  to  track 
in  another  frame,  such  as  the  fe^-frame  or  cam-frame  of  a  wing  aircraft.  The  tracking 
accomplished  in  this  report,  the  lead  aircraft  in  the  cam-frame  of  the  wing  aircraft, 
accounted  for  the  additional  motion  of  the  wing  aircraft  by  simply  modifying  the  time 
constants  and  as  used  in  the  model. 
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The  dynamics  of  the  lead  aircraft,  including  a  model  to  represent  those  dynamics 
has  been  presented.  The  next  section  details  the  formation  positions  of  AR  and  the 
necessary  information  required  to  determine  those  positions  without  knowledge  of  the 
lead  aircraft’s  position  or  attitude. 

3.2  Formation  Position 

The  premise  of  the  algorithm  in  this  thesis  is  to  determine  the  location  of  the  lead 
aircraft  in  the  cam-frame  on  a  wing  aircraft  in  order  to  determine  where  the  receiver 
is  in  relation  to  the  6j^-frame  as  presented  in  Section  2.1.4.  A  few  of  the  various  AR 
positions  are  introduced  as  positions  in  the  6/^-frame  to  understand  the  requirements 
to  effect  AAR  and  to  quantify  the  precision  needed  at  the  various  positions. 

For  this  research,  the  lead  aircraft  (a  T-38  as  a  simulated  tanker)  had  a  promi¬ 
nent  rotating  beacon  underneath  the  aircraft  that  was  visible  in  the  images  collected 
by  the  wing  aircraft  (an  LJ-24  as  a  simulated  receiver).  This  beacon  was  located 
approximately  17.0  feet  from  the  nose  of  the  aircraft,  or  29.0  feet  from  the  tail  of  the 
aircraft.  The  beacon  was  designated  as  the  origin  for  both  the  bi-bame  and  ni-frame, 
denoted  as  a  cylinder  (approximate  shape  of  the  beacon)  in  Figure  3.10. 

The  wing  aircraft  had  a  length  of  approximately  43.0  feet  and  the  coordinate 
frames,  bw  and  nw,  were  originated  at  the  center  of  the  truth  data  collection  device 
(for  simple  determination  of  errors  in  the  algorithm),  denoted  as  a  cube  in  Figure  3.10. 
This  device  was  located  approximately  23.7  feet  from  the  nose  of  the  wing  aircraft 
and  one  foot  left  of  centerline.  With  a  coordinate  system  dehned,  aircraft  within  the 
formation  can  navigate  with  respect  to  predehned  positions  in  the  fe^-frame. 

3.2.1  Rendezvous  Position.  When  aircraft  are  not  in  visual  contact  with 
each  other  or  otherwise  knowledgeable  about  the  other  aircraft’s  position,  the  wing 
aircraft  proceed  to  and/or  maintain  a  position  that  provides  both  lateral  and  vertical 
separation  between  aircraft.  This  position  allows  for  safe  maneuvering  until  the  wing 
aircraft  can  attain  visual  contact  with  lead  and  initiate  a  rejoin  or  rendezvous.  The 
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definition  of  this  position  is  1,000  feet  below  and  1  nautical  mile  (NM)  in  trail  of 
the  lead  aircraft.  Typically,  this  is  a  transient  position  and  held  constant  only  when 
the  lead  aircraft  is  not  in  sight.  The  position  translations  are  no  closer  than  values 
(that  a  receiver  aircraft  can  not  proceed  closer  than  without  visual  contact  with  the 
tanker).  In  the  ^L-frame,  this  position  can  be  represented  as  a  position  vector 
with  the  following  coordinates  in  feet\ 


PlWM  ~ 


-6076  0  1000 


(3,14) 


At  this  distance,  the  reference  frame  origin  locations  of  each  aircraft  are  not  a 
signihcant  influence  on  this  position. 


3.2.2  Pre- Contact  Position.  After  passing  the  rendezvous  position  with  the 
lead  aircraft  in  sight,  the  wing  aircraft  proceeds  to  a  position  known  as  pre-contact. 
The  wing  aircraft  reduces  the  separation  between  the  aircraft  both  vertically  and 
horizontally  in  a  straightforward  maneuver.  The  wing  aircraft  initiates  the  maneuver 
with  20  knots  greater  airspeed  than  the  lead  aircraft  and  reduces  the  closure  rate  to 
zero  as  a  climb  to  lead’s  altitude  is  accomplished.  The  dehnition  of  this  position  is 
50  feet  aft  of  lead  and  slightly  below.  The  position  is  dependent  on  the  size  of  the 
aircraft  and  where  the  appropriate  coordination  frames  are  located;  50  feet  describes 
the  distance  from  the  nose  of  the  wing  aircraft  to  the  tail  of  the  lead  aircraft. 

The  down  component  is  approximated  by  using  a  30°  aspect  angle  measured 
from  the  fe^^-frame’s  negative  X  axis  in  the  direction  of  its  positive  Z  axis.  This 
position  can  be  represented  as  a  position  vector  with  the  following  coordinates: 


PPRE 


-96  -1  25 


(3.15) 


At  this  distance  the  origin  locations  are  accounted  for  in  the  position  dehnition. 
The  negative  one  foot  offset  in  the  Y  axis  of  the  &j^-frame  accounts  for  the  off-centerline 
location  of  the  two  coordinate  frames’  origin  on  the  wing  aircraft. 
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’^bw 

Figure  3.10:  AAR  coordinate  system.  The  coordinate  systems  used  for  AAR,  demon¬ 
strated  in  the  pre-contact  position  (pp^g)  (Note:  not  to  scale) 

The  pre-contact  position  is  a  static  position  that  a  wing  aircraft  maintains  for 
a  period  of  time  and  is  used  for  a  few  tasks.  First,  it  allows  the  wing  aircraft  to 
stabilize  while  matching  the  lead  aircraft’s  airspeed.  Second,  it  demonstrates  to  the 
boom  operator  on  the  lead  aircraft  that  the  pilot  of  the  wing  aircraft  is  under  control 
and  it  is  safe  to  approach.  Third,  it  allows  the  pilot  of  the  wing  aircraft  to  prepare 
for  the  impending  maneuver.  The  position  translations  are  no  closer  than  values. 

3.2.3  Contact  Position.  Once  the  wing  aircraft  sustains  a  stable  pre-contact 
position,  the  wing  aircraft  is  cleared  to  a  contact  position,  denoted  as  p’contact- 
With  a  desired  one  foot  per  second  closure  rate,  maneuvering  from  pre-contact  to 
contact  should  last  30-60  seconds.  The  contact  position  is  dehned  as  a  position  on  a 
30°  aspect  angle  measured  from  the  6i-frame’s  negative  X  axis  in  the  direction  of  its 
positive  Z  axis.  For  a  KC-135  aircraft,  this  position  is  denoted  as  12  feet  slant  range 
from  the  end  of  the  refueling  boom  (before  extension)  of  the  lead  aircraft  to  the  Uni¬ 
versal  Air  Refueling  Receptacle  Slipway  Installation  (UARRSI)  of  the  wing  aircraft. 
This  is  the  fuel  port  where  the  boom  of  the  lead  aircraft  connects  to  transfer  fuel.  For 
this  research,  the  UARRSI  is  approximated  with  the  origin  of  the  cam-frame  and  the 
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end  of  the  refueling  boom  is  approximated  with  the  tail  of  the  lead  aircraft;  however, 
the  position  is  still  dehned  to  the  coordinate-frame  origins  of  the  wing  aircraft: 


P CONTACT 


(3.16) 


This  position  is  intended  to  be  held  statically,  but  for  human  operators,  it  is 
often  challenging  to  do  so.  Changing  conditions  in  airspeed,  altitude,  attitude,  wind, 
visibility,  turbulence,  etc.,  require  constant  inputs  to  the  controls  of  the  wing  aircraft, 
requiring  constant  attention  and  focus.  A  maneuvering  envelope  about  the  position 
permits  fluctuations,  inherent  in  this  position.  For  a  KC-135  aircraft  the  envelope 
allows  the  aspect  angle  to  vary  from  20°  to  40°.  The  distance  envelope  is  6  to  18  feet 
and  the  azimuth  angle  (degrees  left  and  right  from  center)  is  allowed  to  vary  up  to 
10°.  An  experienced  pilot  does  not  require  the  full  envelope;  however,  initial-training 
student-pilots  will  approach  and  often  exceed  these  limits  causing  a  disconnect  (either 
manually  or  automatically  initiated)  of  the  boom  from  the  UARRSI.  Exceeding  these 
limits  can  and  has  led  to  severe  aircraft  damage  and  loss  of  life. 

Figure  3.11  shows  the  envelope  with  a  B-52  aircraft  in  the  contact  position. 
The  distance  from  the  base  of  the  boom  to  the  end  of  the  refueling  boom  (before 
the  extension)  is  approximately  27  feet,  7  inches,  and  the  slant  range  envelope  starts 
6  feet,  1  inch  aft  of  that  or  the  tip  of  the  extension  (33  feet,  8  inches  total).  The 
envelope  extends  12  feet  3  inches,  allowing  a  full  envelope  of  6  to  18  feet  slant  range. 


3.3  Position  Realization 

It  is  possible  to  train  a  human  pilot  to  recognize  the  above  formation  positions 
and  determine  deviations  from  those  positions  using  both  visual  and  aircraft  instru¬ 
mentation  cues.  From  the  author’s  AR  instructing  experience,  experienced  aviators 
can  typically  estimate  the  contact  position  within  1  to  2  feet  of  accuracy  and  the 
pre-contact  position  within  10  to  20  feet  of  accuracy.  Pilots  often  rely  on  aircraft 
instrumentation  (such  as  radar)  for  distances  further  than  3,000  feet  range,  with  a 
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Figure  3.11:  KC-135  refueling  envelope  [14],  This  research  assumes  this  envelope. 


transition  to  visnal-only  references  at  closer  distances.  For  a  compnter-vision  solu¬ 
tion,  determining  these  positions  is  more  involved.  Starting  with  the  most  general 
difference  equation  between  two  navigating  bodies,  this  section  presents  a  derivation 
that  details  the  information  needed  to  determine  the  position  of  the  wing  aircraft  in 
the  fei-frame: 

~  Pw  ~  Pl  (3-17) 

where  the  symbol,  denotes  the  position  vector  from  the  lead  aircraft  to  the 

wing  aircraft,  referenced  in  the  i-frame.  This  equation  relates  the  difference  in  inertial 

position  of  the  aircraft  in  the  i-frame.  It  is  generally  accepted  that  navigation  on  the 
Earth’s  snrface  and  sub-orbital  atmosphere  can  be  accomplished  in  the  e-frame.  This 
is  the  navigation  frame  used  by  aircraft  with  the  aid  of  GPS  and  is  the  frame  used  in 
this  research: 

Pl^w  =  Pw  ~  Pl  (3.18) 
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If  the  rotation  between  the  e-frame  and  fe^-frame  is  known,  the  difference  can 
be  rotated  into  the  6i-frame,  ~ 

PlUv/  ~  Ce^(Pw  ~  Pl)  (3.19) 

The  DCM  is  separated  into  realizable  transformations  with  the  following 
results: 

ph„.  =  c‘‘,,ctcrpE,,  -  c^^c:‘pi.  (3,20) 

Critical  navigation  will  only  be  required  when  the  aircraft  are  relatively  close 
to  each  other,  so  the  e-frame  to  n-frame  conversion  of  both  aircraft  is  assumed  to  be 
equal  (C"^  =  Cg'^).  Furthermore,  the  position  errors  resulting  from  this  assumption 
will  not  add  signihcantly  to  the  hndings  in  this  research.  If,  in  the  future,  the  ap¬ 
proach  presented  here  can  reduce  the  error  below  an  appropriate  threshold  then  this 
assumption  can  be  re-evaluated.  To  reduce  confusion,  these  two  frames  have  been 
replaced  with  a  general  n-frame: 


C”  ^  (3.21) 

'Pl=^w  ~  Cn^Cg(p^  —  p£)  (3.22) 

Dealing  with  a  frame  located  at  the  origin  of  the  lead  aircraft  allows  the  subscript 
L  to  be  dropped,  it  is  assumed  that  the  position  vector  is  from  the  frame  origin  to 
the  identifier  subscript: 


pit-  =  C^c:(pii,,  -  pD  (3,23) 

This  is  the  basic  equation  needed  for  a  DGPS  approach  and  requires  the  orien¬ 
tation  of  the  lead  aircraft  to  compute  the  n-frame  to  fe^-frame  conversion  (C((^).  The 
wing  aircraft  has  access  to  its  own  position  in  the  e-frame,  p^,  and  also  the  rotation 
Cg  as  a  function  of  p^.  By  broadcasting  the  lead  aircraft’s  navigation  position  (p£) 
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and  attitude  (C^^)  to  the  wing  aircraft,  the  wing  aircraft  can  compute  its  position 
relative  to  the  lead  aircraft  and  navigate  successfully. 

For  justihcations  addressed  in  Chapter  1,  it  would  be  useful  if  a  camera  and  its 
associated  reference  frame,  cam-frame,  could  provide  an  alternative  approach  to  this 
equation.  The  following  equations  determine  the  necessary  information  needed  for 
this  alternate  approach.  In  a  similar  manner  to  the  DGPS  equation  derivation  shown 
above,  the  following  equations  are  presented: 


cam  _  ^camf-iTif  e  e  \ 

PCAM^L  —  '^cKPl  ~  PcAMJ 
Pw^CAM  —  ^’A^^iPcAM  ~  Pw)- 


(3.24) 

(3.25) 


Solving  for  'Pcam  Equation  (3.24): 


e  _  e  (-^er-^n  cam 

Pc  AM  —  Pl  ^n^camPCAM^L 


(3.26) 


Substituting  Equation  (3.26)  into  Equation  (3.25)  and  solving  for  p^,  results  in: 


e  _  e  cam 

Pw  —  Pl  '^n'^camPcAM^L  '^n'^bwPw 


>CAM 


(3.27) 


Substituting  from  Equation  (3.27)  into  Equation  (3.22),  results  in: 


cam  ,  f-Mi  bw 

Pl^W  ~  '^n  K'^camPcAM^L  T"  '^bwPw= 


■CAM) 


(3.28) 


The  DCM  is  separated  into  realizable  transformations: 


bL  _  r^bLf-^n  ( (~ibw  ^cam  ,  bw 

Pl^W  ~  '^n  '^bw\^camPcAM^L  Pw= 


■CAM) 


(3.29) 


Again,  the  subscripts  can  be  dropped: 


bL  _  (~ibL(~in  ((~ibw  „cam  I  bw  \ 
Pw  ~  ^bw^^camPL  +  PcAM) 


(3.30) 
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Instead  of  using  the  e- frame  locations  of  the  aircraft,  this  approach  makes  use 
of  the  position  of  the  camera  in  the  6vK-frame  (p^at)  estimated  position 

of  lead  in  the  cam-frame  (p“™).  These  positions  are  converted  into  a  common  ref¬ 
erence  frame,  summed,  and  converted  into  the  fe^-frame  to  determine  the  estimated 
position  of  the  wing  aircraft  in  the  ^L-frame,  (p^).  Critical  components  of  Equa¬ 
tions  (3.23)  and  (3.30)  are  shown  graphically  in  Figure  3.12. 


Figure  3.12:  The  critical  components  required  for  autonomous  air  refueling  navigation, 
p^  can  be  determined  from  the  difference  of  p^  and  p£  through  the  use  of  differential 
GPS  or  the  summing  of  and  p^^r  through  the  use  of  image-aided  relative- 
formation  navigation. 


In  Equation  (3.30),  P^m  ^caL  assumed  constant  and  known  before 
flight.  In  contrast  to  the  information  needed  in  Equation  (3.23),  the  DCM 
is  needed  and  will  be  known  from  an  on-board  Inertial  Navigation  System  (INS). 
However,  without  a  broadcast  transmission  from  the  lead  aircraft,  a  wing  aircraft  will 
not  have  access  to  two  components  of  this  equation:  the  three  Euler  angles  of  the  lead 
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aircraft  to  compute  or  p“"^.  Estimating  these  two  components  using  the  images 
provided  from  the  on-board  camera  and  the  INS  of  the  wing  aircraft  is  the  focus  of 
this  research. 

The  hnal  section  outlines  transferring  these  position  realizations  into  the  OpenGL 
world.  The  algorithm  in  this  thesis  creates  images  based  on  the  Kalman  hlter’s 
estimation  of  the  two  required  components  of  Equation  (3.29)  and  the  following  sec¬ 
tion  demonstrates  how  potential  positions  and  attitudes  of  the  tanker  are  created  as 
\r  images. 

3.3.1  Rendering  Positions.  At  its  simplest,  the  OpenGL  rendering  process 
requires  six  parameters:  the  position  of  an  object  in  the  GLcam-frame  (three)  and 
their  rotation  with  respect  to  that  frame  (three).  OpenGL  allows  the  object  to  be 
rotated  first  and  then  translated  or  translated  hrst  then  rotated,  resnlting  in  mnch 
different  scenes  and  images.  To  adhere  to  the  coordinate-reference-frame  translation 
and  rotation  process  detailed  in  Section  2.1.3,  the  rendering  process  shown  here  will 
always  translate  hrst  and  then  rotate.  The  rotation  between  the  GLcam-frame  and 
the  6-frame  of  an  object  is  accomplished  in  the  following  manner:  rotation  abont 
the  Y Gleam  axis,  rotation  abont  the  Y^cicam  axis,  and  hnally  rotation  abont  the 
ZcLcam  axis.  Additionally,  since  this  process  is  intended  to  represent  actnal  images 
collected  by  the  camera,  the  cam-frame  and  GLcam-frame  share  the  same  origin  as 
demonstrated  in  Section  2. 2. 1.2. 

The  necessary  attitnde  information  of  the  lead  aircraft  is  different  then  that 
reqnired  for  the  navigation  solution  in  Section  3.3.  Instead,  the  attitnde  of  the  lead 
aircraft  with  respect  to  the  GLcam-frame,  or  needed.  This  DGM  can 

be  compnted  by  combining  fonr  different  DGMs;  two  created  from  the  Enler  angles 
of  each  aircraft  (C((^  and  Cjj^),  one  determined  before  flight  (the  rotation  of  the 
wing  aircraft  6vc-frame  origin  to  camera,  C^^),  and  the  transformation  between  the 
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cam-frame  and  GLcam-frame  (CQUam)  shown: 


CtbL  _  C'bLC'n  (~^bw  C'cam  fn  qi  \ 

'^GLcam  ~  '^n  bw^  cara^  GLcam  1,0. OIJ 

Of  these  four  DCMs,  only  one  will  be  unknown  to  the  algorithm, 

The  translation  of  the  lead  aircraft  will  be  referred  to  as  similar  to  that 

described  in  Section  3.3  and  accounting  for  the  rotational  difference  between  frames. 
Equation  (3.29)  is  modihed  to: 

_  r^b]^(~\n  ( (~\bw  r^cam  —GLcam.  bw  \  (n  qo\ 

Vw  ~  '^n  '^bw\^cam'^GLcamPL  “r  PcAM)  1,0. OZJ 

and  solving  for  p^icam  yjgifjg- 

(3.33) 

Of  these  parameters,  only  six  will  be  unknown  to  the  algorithm,  and  p^. 

The  vector  and  matrix  with  six  total  parameters  (or  six  DOFs) 

determine  where  to  place  the  aircraft  in  the  frustum,  and  at  what  attitude.  The 
algorithm  predicts  these  six  parameters  from  a  combination  of  state  estimations  from 
the  Kalman  hlter  and  their  hnal  values  at  the  previous  time  instant,  then  verihes 
them  with  a  measurement  process  detailed  in  the  next  chapter. 

This  chapter  introduced  the  dynamics  of  AR,  the  model  that  describes  those 
dynamics,  and  the  position  requirements  of  AR  including  the  OpenGL  representation 
of  those  positions.  The  next  chapter  presents  this  thesis’  solution  to  Autonomous 
Aerial  Refueling. 
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IV.  Pose  and  Air  Refueling 


Building  on  the  knowledge  of  the  previous  chapters  and  incorporating  the  research 
accomplished  by  others,  this  chapter  presents  a  novel  approach  to  AAR.  The 
chapter  is  broken  into  three  sections:  the  first  details  the  position  and  orientation 
estimation,  commonly  referred  to  as  pose  estimation;  a  process  alluded  to  in  previous 
chapters  of  this  thesis.  The  next  section  is  a  background  investigation  into  other 
approaches  taken  to  solve  AAR,  and  those  using  a  rendered  model  approach  to  pose. 
The  hnal  section  presents  the  works  and  techniques  of  this  thesis. 

4.1  Pose 

In  computer  vision  disciplines,  it  is  often  necessary  to  determine  an  object’s 
position  and  orientation  with  respect  to  a  specihed  coordinate  system.  Estimating 
an  object’s  attitude  and  translation  in  the  cam-frame  {C^cam  is  this  thesis’ 

interpretation  of  pose  estimation.  The  pose  of  an  object  allows  the  system  to  make 
determinations  about  the  object,  interact  with  the  object,  or  in  the  case  of  AAR,  track 
the  object.  This  held  of  study  encompasses  many  different  methods  to  determine  the 
pose  of  an  object  with  the  desired  accnracy.  The  pose  process  incorporates  single  or 
mnltiple  image  collection  devices  and  may  incorporate  other  sensors  (snch  as  distance 
measnring  or  three-dimensional  scanning),  which  are  real-time  or  post-processed  for 
various  lengths  of  time. 

The  research  presented  in  this  thesis  focuses  on  hnding  the  pose  of  a  lead  air¬ 
craft  based  on  images  alone  to  determine  a  relative  navigation  solution  between  the 
two  aircraft.  The  inherent  difficulty  with  pose  from  images  is  estimating  depth,  or 
translation  of  the  object  along  the  Zcam  axis,  as  described  in  Section  2. 1.3.5  (the 
camera  matrix,  K,  discards  this  information).  To  aid  the  process,  a  system  often  has 
some  predetermined  knowledge  of  the  camera,  environment,  and  object  (typically  the 
object’s  scale). 

With  this  a  priori  knowledge,  a  common  vision-based  approach  (not  used  in  this 
research)  determines  the  pose  of  an  object  with  trigonometric  calculations  based  on  a 
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specific  number  of  surveyed  features  of  the  object  that  are  located  in  an  image.  This 
approach  usually  involves  matching,  from  image  to  image,  these  determined  features 
on  the  object.  This  is  typically  referred  to  as  point  or  feature  tracking. 

Tracking  individual  points  visually  on  an  object  presents  a  few  challenges.  The 
hrst  difficulty  is  in  accounting  for  features  departing  the  FOV  of  the  camera.  If  not 
handled  appropriately,  the  system  will  match  the  features  to  similar  but  incorrect 
features  on  the  object,  causing  residual  error  in  the  estimation.  The  second  difficulty 
concerns  the  lighting  and  shading  on  an  object.  Features  that  are  easy  to  determine  in 
one  lighting  condition  might  not  be  as  easy  to  detect  in  other  conditions.  Changes  in 
lighting  can  also  introduce  confusion  between  similar  but  incorrectly  matched  features. 
A  hnal  difficulty  is  in  updating  the  visual  appearance  of  the  feature  with  changes  in 
orientation.  When  tracking  an  antenna  on  the  bottom  of  the  aircraft,  the  antenna’s 
appearance  changes  depending  on  which  side  of  the  aircraft  the  camera  is  on. 

These  difficulties  are  not  insurmountable  and  this  approach  works  for  many  ap¬ 
plications.  The  next  section  details  a  small  sampling  of  visual-based  pose  in  addition 
to  differing  methods  to  conduct  AAR. 

4-2  Current  Science 

This  section  presents  various  approaches  to  AAR  and  the  vision  based  approach 
to  pose.  AFRL  has  done  and  continues  to  do  extensive  research  in  the  area  of  AAR, 
and  many  projects  sponsored  by  AFRL  have  aided  this  research  immensely. 

4-2.1  AFIT.  Research  at  AFIT  began  in  the  early  1990s  and  concentrated 
mainly  on  formation  flight  controllers  [16].  Starting  with  simple  controllers  and  lim¬ 
ited  freedoms  by  the  lead  and  wing  aircraft,  investigations  focused  on  real-time  au¬ 
tonomous  controllers. 

4-2. 1.1  Spinelli  and  Ross.  AFIT  research  into  AAR  reached  a  mile¬ 
stone  with  thesis  work  done  by  Spinelli  and  Ross  in  2006  [16,  24].  Both  worked 


independently  to  create  a  combined  DGPS-dependent  formation  controller.  Spinelli’s 
research  focused  on  a  combination  transmitter  and  receiver  from  the  lead  aircraft 
broadcasting  real-time  GPS  data  to  the  receiver  aircraft.  This  information  was  pro¬ 
cessed  by  a  computer  on-board  the  receiver  aircraft.  Algorithms  devised  by  Ross 
plotted  a  relative  navigation  solution.  This  solution  provided  control  inputs  to  the 
receiver’s  autopilot  controller,  resulting  in  autonomous  formation  flight  at  ranges  from 
10  to  100  feet  and  up  to  30  degrees  of  bank. 

The  research  was  demonstrated  using  a  TPS  G-12  Huron  as  the  simulated  tanker 
and  a  Galspan  LJ-24  Learjet  as  the  receiver,  shown  in  Figure  4.1.  The  wing  aircraft 
maneuvered  through  three  positions,  based  off  the  lead  aircraft:  contact,  pre-contact, 
and  awaiting  AR  (wing  aircraft  immediately  off  the  right  wing  of  lead).  Average 
error  was  calculated  to  be  approximately  one  to  two  feet  with  a  maximum  of  approxi¬ 
mately  four  to  five  feet  [16].  These  test  flights  represented  a  tremendous  step  towards 
achieving  AAR. 


Figure  4.1:  Differential  GPS  AAR  between  a  G-12  and  a  LJ-24  [16].  This  successful 
demonstration  proved  the  capability  of  AAR  and  led  to  many  future  projects. 

While  this  work  became  a  benchmark  for  AAR,  many  improvements  to  the 
approach  were  desired.  Specifically,  the  incorporation  of  a  vision-based  sensor  to  aid 
the  navigation  solution. 
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4 -2. 1.2  Spencer.  Following  the  success  of  Spinelli  and  Ross,  Spencer 
also  used  a  C-12  and  LJ-24  to  conduct  test  flights  in  2008  using  optical  tracking  of  the 
lead  aircraft  [23].  The  test  flights  were  not  autonomous,  but  accomplished  real-time, 
with  on-board  processing  and  data  collection  for  additional  post  flight  analysis. 

Spencer  utilized  an  electro-optic  (EO)  sensor  in  conjunction  with  a  Harris  corner 
detector  algorithm  to  track  multiple  feature  points  of  the  lead  aircraft  (more  than  12 
per  frame,  as  shown  in  Figure  4.2).  The  EO  sensor  was  able  to  collect  real-time 
images  for  processing  by  an  on-board  computer.  By  making  predictions  of  where  the 
corners  should  be  with  a  Kalman  hlter,  point  tracking  of  the  corners  used  a  gating 
technique.  This  technique  limited  the  held  of  view  around  each  tracked  point  that  a 
feature  could  match  to,  so  antennae  on  the  rear  of  the  aircraft  could  not  match  to 
antennae  on  the  front  of  the  aircraft.  An  initial  position  and  attitude  was  provided 
to  the  algorithm  via  a  non-passive  sensor  [23]. 


Figure  4.2:  Visual  AAR  using  feature  tracking  [23].  A  visual  approach  to  determining 
the  relative  position  between  aircraft  by  tracking  individual  points,  surveyed  to  known 
locations,  of  a  lead  aircraft. 


This  process  found  and  updated  a  relative  position  of  the  wing  aircraft  with 
respect  to  the  lead  aircraft  using  known  distance  and  direction  between  the  tracked 
points  (a  three-dimensional  model  of  surveyed  points  on  the  aircraft).  Spencer’s 
research  demonstrated  the  feasibility  of  the  vision-based  navigation  concept.  As  a 
result,  future  research  can  now  address  some  of  the  problems  discovered  in  his  re¬ 
search.  Specihc  errors  relating  to  the  vision-based  approach  to  AAR  are:  the  loss  of 
tracking  features  due  to  poor  visibility,  delays  in  position  estimates  because  of  image 
processing  time,  errors  related  to  camera  calibration,  and  pose  errors  from  incorrect 
corner  tracking. 

Perhaps  one  of  the  most  important  lessons  learned  concerned  camera  attitude 
(■^,0,0).  The  receiver  aircraft’s  attitude  is  very  important  in  determining  the  re¬ 
ceiver’s  position  in  relation  to  the  tanker.  Failing  to  account  for  yaw,  pitch,  and  roll 
of  the  receiver  aircraft  dramatically  detracts  from  the  algorithm’s  ability  to  identify 
the  correct  relative  position. 

Spencer  suggested,  as  a  potential  solution  to  a  few  of  the  problems,  to  apply 
group  tracking  instead  of  individual  points.  While  individual  points  are  prone  to 
misinterpretation  by  the  detection  algorithm,  tracking  groups  of  points  can  potentially 
limit  those  errors.  He  discovered  that  multiple  point  errors  were  not  common  in  an 
individual  frame.  The  group  of  points  recommendation  was  expanded  to  include  the 
entire  tanker  (designated  here  as  the  whole  aircraft  approach)  in  subsequent  research 
efforts. 


4 -2. 1.3  Weaver.  As  a  follow  up  to  Spencer’s  work.  Weaver  pursued 
an  alternative  approach  to  the  aircraft  tracking  problem  [29] .  Weaver’s  work  utilized 
a  long- wave  infrared  video  representation  of  a  KC-135R  (acquired  from  an  AFRL 
research  initiative)  in  conjunction  with  a  three-dimensional  rendered  model  of  the 
same  aircraft  type  (obtained  commercially).  Using  an  extended  Kalman  hlter  (EKF), 
Weaver  was  able  to  make  a  priori  predictions  about  the  KC-135R’s  position  relative 
to  the  receiver.  These  predictions  became  images  produced  through  the  rendering 


of  the  three-dimensional  computer  model.  A  sum-squared  difference  between  the 
pixel  intensities  of  the  rendered  and  collected  image  determined  an  error  between  the 
predicted  position  and  the  actual  position.  However,  the  calculation  only  provided  the 
magnitude  of  the  error.  Determination  of  the  relative  direction  of  the  error  required 
an  iterative  process  to  correct  the  image  for  the  next  update.  This  process  continually 
perturbed  (change  in  magnitude  and  direction)  the  estimated  position  and  orientation 
of  the  KC-135R.  This  created  a  new  image  for  each  perturbation  to  compare  with  the 
true  collected  image  of  the  tanker.  An  example  of  a  collected  image  with  the  predicted 
image  overlaid  (intentionally  offset)  is  shown  in  Figure  4.3. 


Semi-T  ransparent 

Prediction 

L  ^  ■ 

J 

N.  True  Infrared 

^  Image 

Figure  4.3:  AAR  using  whole  aircraft  tracking  with  infrared  images  [29].  A  vision 
based  approach  determines  the  relative  position  between  aircraft  by  comparing  a 
collected  image  with  rendered  images  of  the  same  aircraft.  The  prediction  image  is 
intentionally  offset  from  the  true  image  for  visual  clarity. 

Weaver’s  research  was  a  departure  from  the  feature  and  point  tracking  methods 
found  in  many  research  efforts.  By  forging  this  new  direction  he  discovered  new  sets 
of  problems  including  errors  in  tanker-body  X-axis,  roll  errors,  and  length  image 
processing  time. 

Images  associated  with  a  pitching  movement  by  the  tanker  were  difficult  to 
distinguish  from  images  representing  a  relative  acceleration  forward.  This  resulted 
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in  Weaver’s  system  having  position  estimates  with  1-meter  accuracy  in  the  Y  and  Z 
directions,  but  2-meters  in  the  X  direction.  In  addition,  this  system  did  not  respond 
well  to  turns  by  the  tanker,  potentially  because  of  the  values  predicted  by  the  Kalman 
hlter  for  the  tanker’s  roll  position.  When  a  turn  occurs,  an  incorrectly  modeled  hlter 
wants  to  return  to,  or  maintain,  a  nominal  wings  level  state  instead  of  accepting 
the  change  in  roll  measurement.  The  majority  of  the  time,  the  tanker  maintains 
constant  speed,  heading,  and  roll  angle.  Eventually,  however,  the  aircraft  does  make 
large  roll  angle  changes.  If  the  Kalman  hlter  permits  large  variations  to  the  state  (a 
large  covariance  in  the  dynamic-model  process-noise),  accurate  solutions  are  difficult. 
Conversely,  allowing  only  small  variations  can  minimize  the  effects  of  measurements 
on  the  Kalman  state. 

Suggestions  by  Weaver  for  future  research  included  additional  sensors  such  as 
ranging  or  GPS  data  link  (as  introduced  by  Spinelli  and  Ross)  and  potentially  an 
Unscented  Kalman  Filter  (UKF).  The  UKF  could  enhance  the  speed  of  the  updates 
to  the  navigation  solution  by  potentially  reducing  the  processing  time  to  determine 
the  direction  of  error  between  the  two  images. 

4-2.2  Georgia  Institute  of  Teehnology.  Researchers  at  Georgia  Tech  have 
done  considerable  work  in  the  area  of  vision-based  navigation  [5,15,30].  In  2005, 
Wu,  Johnson,  and  Proctor  used  images  of  an  object  seen  by  an  unmanned  helicopter 
to  aid  in  the  helicopter’s  navigational  solution.  Information  determined  from  the 
images  consisted  of  a  center  point  and  total  area.  Using  an  EKF,  these  measurements 
provided  updates  to  the  state  and  covariance  of  the  navigation  solution.  This  system 
assumed  that  the  pose  algorithm  knows  the  position,  size,  and  orientation  of  the 
object  before  the  flight.  Using  an  object  (a  window  of  36  ft^)  the  system  was  able  to 
track  its  position  within  a  mean  error  of  about  six  feet  [30]. 

Further  research  at  Georgia  Tech  involved  whole  aircraft  tracking.  Aircraft 
were  tracked  using  their  center  and  wingtips  (designated  in  their  research  as  the 
eenter  tips  approach)  [15].  The  information  about  the  aircraft’s  center  estimated 
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the  target’s  relative  azimuth  and  elevation,  while  its  tips  estimated  the  aircraft’s 
distance.  The  UKF,  as  implemented,  tracked  the  target’s  position,  velocity,  size,  and 
acceleration  using  measurements  attained  from  the  image.  Results  from  this  research 
were  satisfactory  and  one  of  the  future  research  suggestions  was  to  incorporate  a  more 
sophisticated  representation  of  the  target,  such  as  a  predicted  rendering  of  the  whole 
target,  as  opposed  to  estimation  of  just  the  three  tracked  points. 

4.2.3  Visual-Model  Based  POSE.  Researchers  de  Ruiter  and  Benhabib 
expanded  the  whole  aircraft  approach  to  include  visual  or  textual  models  of  the  target 
[17, 18].  By  utilizing  a  priori,  visnal,  and  an  adaptive  model  of  a  rigid  body,  the 
researchers  were  able  to  track  the  body  and  determine  the  orientation  and  position 
of  the  object.  A  comparison  between  the  rendered  predictive  images,  ntilizing  the 
model  in  OpenGL,  with  the  actnal  image  collected  by  the  camera,  determines  the 
accnracy  of  the  prediction.  The  difference  between  the  two  images  was  calculated 
and  combined  with  the  information  used  to  render  the  initial  predictive  image  to 
determine  an  accnrate  pose  of  the  target. 

Image-based  tracking  is  computationally  demanding  and  difficult  to  use  in  or¬ 
der  to  attain  accnrate  navigation  information  in  real-time.  However,  de  Rniter  and 
Benhabib  were  able  to  attain  rates  of  80-100  frames  per  seeond  and  sub-pixel  accu¬ 
racy  by  reducing  nnnecessary  compntations.  Rednced  gradient  compntation  time  was 
possible  by  setting  a  region  of  interest  aronnd  the  target  in  the  image  and  redncing 
three  color  gradient  images  to  single  color  gradient  images  [18].  The  target  used  in 
this  research  was  a  simple  cnbe  with  a  simple  texture/image  on  each  side. 

The  backgronnd  research  provides  a  baseline  of  the  current  approaches  to  pose 
and  AAR.  The  method  of  this  thesis  bnilds  on  these  techniqnes,  combining  the  benefits 
of  some  while  addressing  the  limitations  of  others.  The  next  section  presents  a  generic 
rendered-image  approach  to  vision-base  pose,  followed  by  the  specihc  implementation 
in  this  research. 
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4.3  RIPE 

A  few  of  the  researchers  in  the  previous  section  and  some  others  are  indepen¬ 
dently  developing  a  sub-discipline  of  pose  that  proposes  a  whole  object  method  of 
tracking.  This  type  of  approach  addresses  some  of  the  difficulties  of  point  tracking 
alluded  to  in  the  background  research  by  tracking  an  object  as  a  whole  instead  of  in¬ 
dividual  features.  Weaver’s  approach  to  whole  object  tracking  made  use  of  rendered 
images  of  the  object.  The  premise  of  a  rendered  image  position  and  orientation  esti¬ 
mation,  RIPE^,  approach  to  navigation  requires  a  three-dimensional  computer  model 
of  an  object  (a  priori  or  created  real-time)  that  can  be  rendered  as  a  two-dimensional 
1^.  This  notional  approach  can  involve  one  or  more  images.  The  time  allotted  for 
pose  and  the  required  accuracy  determines  the  desired  number  of  images. 

The  method  employed  by  Weaver  [29]  in  his  research  consisted  of  creating  mul¬ 
tiple  If.  images,  comparing  each  with  a  single  Ic  for  every  measurement  update.  The 
six  DOF  information  to  create  each  1^  initially  came  from  the  EKF’s  state  estimation 
in  the  n-frame  (rotated  and  translated  into  the  cam-frame)  of  the  tanker  aircraft’s 
attitude  and  translation  from  the  camera,  C\am  respectively. 

The  Kalman  hlter’s  a  priori  state  estimation  was  rendered  in  addition  to  several 
If  images  created  with  perturbations  in  each  DOF  about  the  estimate.  Computations 
on  the  uncertainty  in  the  state  estimation  (a  Cholesky  decomposition  of  the  covari¬ 
ance,  \/P)  determined  the  necessary  magnitude  of  the  perturbations  about  the 
and  estimates. 

A  sum-squared-difference  calculation  on  the  pixel  intensities  of  each  If  and  the 
Ic  found  an  error  value  for  each  resulting  perturbation  image  as  well  as  the  original 
state-estimated  image.  Combined,  a  set  of  the  errors  created  a  gradient  of  likelihood 
for  each  of  the  six  DOFs.  The  DOF  gradient  with  the  most  potential  improvement 
in  matching  likelihood  updated  the  estimate.  This  update  replaced  the  DOF’s  value 

^The  term  RIPE  was  introduced  as  part  of  the  data  collection  project  facilitating  this  research 
and  is  used  in  this  thesis  as  a  label  for  this  newly  developing  approach  to  pose. 
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in  the  original  estimate  with  the  more-likely  value  (based  on  the  gradient  likelihood), 
creating  a  new  estimate.  From  this  new  estimate,  a  new  set  of  perturbation  were 
determined  and  new  set  of  images  were  rendered.  The  process  iterated  until  no 
more  improvements  were  possible,  such  that  no  perturbation  images  appeared  more 
likely  than  the  estimate’s  1^. 

Next,  a  more  precise,  or  hne-pose  measurement  determination  was  initiated. 
The  hne  process  mimicked  the  coarse  process  with  perturbations  one  tenth  the  size, 
until  no  further  improvements  were  possible.  This  hnal  estimate  represented  a  global, 
most-likely  pose  of  the  tanker  aircraft.  The  information  required  to  create  the  hnal, 
most-likely  provided  a  measurement  update  to  the  hlter. 

The  approach  was  iterative;  large  perturbations  along  one  DOF  at  a  time  were 
rendered  to  determine  coarse  estimations  of  the  object  and  then  smaller  perturbations 
were  rendered  for  hne  estimation  [29].  An  assumption  made  in  that  research  was  the 
decoupling  of  the  six  DOFs.  Without  this  assumption,  the  matching  process  would 
require  729  If  images  per  iteration;  the  state  estimate  plus  two  perturbations  in  each 
DOF  coupled  (3”,  n  =  number  of  coupled  DOFs).  This  important  assumption  reduced 
the  required  1^.  images  to  13  for  each  iteration,  the  initial  estimate  plus  a  single  positive 
and  negative  perturbation  in  each  DOF  (2*m-|-l,  m  =  number  of  decoupled  DOFs). 
Unfortunately  this  assumption  also  limits  the  precision  of  the  estimation. 

Decoupling  the  DOFs  assumes  the  motion  of  an  object  in  any  DOF  is  indepen¬ 
dent  of  the  other  hve.  Visually,  a  decoupling  assumes  that  motion  in  one  DOF  can 
be  determined  without  concern  of  other  motions.  For  example,  if  an  object  translates 
left  and  up  in  an  image,  the  decoupling  assumes  the  left  translation  is  distinguishable 
in  the  image  without  accounting  for  the  translation  up. 

This  assumption  is  valid  when  the  required  perturbations  about  the  estimated 
position  to  make  an  image  match  are  small.  When  large  motions  occur  between 
collected  images  this  assumption  introduces  additional  error.  Large  visual  matching 
errors  do  not  necessarily  relate  to  position  errors  in  the  linear  manner  necessary 
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for  this  assumption.  Weaver’s  work  relied  on  a  single  resembling  1^  more  than 
other  images  and  that  the  iterated  perturbations  would  eventually  hnd  a  global 
maximum  likelihood  1^  as  a  match  to  the  true  Ic-  However,  with  large  motions  and 
decoupled  DOFs,  local  maximum-likelihoods  can  occur  at  locations  removed  from  the 
true  location.  At  these  local  maximums,  all  other  potential  1^  images  appear  less 
likely  and  the  perturbation  and  iteration  process  stops,  accepting  the  local  maximum 
as  the  incorrect  position  and  orientation  of  the  tanker.  These  errant  positions  became 
incorrect  measurements  that  disrupted  the  estimation  of  the  hlter.  Fortunately,  in 
his  approach  the  perturbations  about  future  state  estimations  grew  because  of  the 
bad  measurements  (larger  covariance  entailed  larger  perturbations)  allowing  the  hlter 
to  eventually  recover.  The  cost  of  the  assumption  was  both  processing  time  and 
temporary  inaccuracies  in  the  state  estimate. 

The  next  section  presents  an  efficient  rendered  image  approach  to  pose  with  a 
focus  on  minimizing  the  required  number  of  images  to  as  few  as  possible  while 
attempting  to  maintain  the  accuracy  of  a  coupled  DOF  approach. 

4-3.1  Quick- RIPE.  To  address  some  of  the  problems  encountered  in  previous 

efforts,  the  quick-RIPE  approach  uses  template  matching  to  reduce  the  number  of 
images  necessary  for  a  pose  estimation  of  an  object  while  not  completely  decoupling 
the  DOFs  for  faster  and  accurate  estimations.  Determined  from  the  background 
research,  and  laboratory  experiments,  there  is  a  discernable  balance  between  speed 
and  accuracy.  With  more  images  rendered,  the  pose  accuracy  increases  but  so  does 
the  time  required  for  determination.  The  hrst  section  details  a  general  quick-RIPE 
approach,  while  the  second  section  tailors  this  approach  to  AAR. 

4-3. 1.1  Quick-RIPE  Methodology.  With  the  use  of  a  template  match¬ 
ing  function,  such  as  the  one  described  in  Section  2. 2. 2. 2,  the  total  number  of  R 
images  needed  to  determine  the  image  location  of  an  object  in  the  Ic  reduces  to  one, 
while  effectively  coupling  two  or  more  DOFs.  For  the  rest  of  this  thesis,  the  term 
image  location  refers  to  the  translation  of  the  object  in  the  ILimage  and  Y image  axes 
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(the  object’s  coordinates  in  the  image-frame) .  Through  the  linear  relationship  of  the 
camera  matrix  (K,  Section  2. 1.3. 5)  the  object’s  translations  in  Xcam,  ^cam,  and  Zcam 
determine  the  image  location  of  the  object  in  the  I^.  The  term  position  still  refers  to 
the  actual  position  of  the  object  in  the  desired  reference  frame.  This  section  references 
two  positions,  the  real-world  position  of  the  object,  p^,  and  the  algorithm’s  predicted 
position  of  the  object,  p^.  The  process  involves  estimating  p,,  close  enough  that  a 
template  match  corrects  it  to  p^.  This  section  details  the  template  matching  process. 

Quick- RIPE  template  matching  separates  the  DOFs  of  an  object  into  the  two 
following  groups,  which  are  broken  down  into  their  components: 

•  Attitude  plus  image  location:  Rotation  in  all  three  cam-frame  axes  plus 
translation  in  the  Xj^age  and  Y image  axes. 

•  Size  plus  image  location:  Translation  in  the  Zcam,  in  addition  to  ^image  and 
Y image  axeS. 

In  this  grouping,  size  denotes  the  object’s  translation  in  the  Zcam  axis.  Effec¬ 
tively,  this  grouping  creates  a  group  of  hve  DOFs  and  a  group  of  three  respectively. 
At  hrst  glance,  this  does  not  appear  to  be  much  of  a  decoupling.  However,  with  tem¬ 
plate  matching  it  reduces  the  rendering  cost  to  an  equivalent  grouping  of  three  DOFs 
coupled  and  one  DOF  uncoupled  respectively.  The  grouping  reduces  the  number  of 
required  1^  images  while  permitting  partial  coupling  of  the  DOFs. 

As  a  general  overview,  the  quick-RIPE  template  matching  renders  an  object  ap¬ 
proximately  close  to  and  with  approximately  the  same  attitude  as  the  actual  object 
in  the  camera’s  FOV.  First,  the  coupled  attitude-DOFs  of  the  object  are  perturbed 
and  each  rendered  as  an  1^.  Each  is  then  template  matched  to  the  Ic  determining 
the  most  likely  combination  of  attitudes.  Second,  the  process  is  repeated  with  pertur¬ 
bations  along  the  Zcam  axis,  determining  the  most  likely  size  of  the  object  and,  as  a 
result  of  the  template  matching  process,  the  most  likely  image  location  of  the  object. 
The  combination  of  the  translation  in  the  Zcam  axis,  and  the  image  location  of 
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the  object  determines  the  translation  in  the  Xcam  axis  and  Ycam  axis,  and 
through  an  approximated,  linear,  trigonometric  relationship. 

4 .3. 1.2  Quick-Ripe  Template  Matching.  To  help  introduce  the  tem¬ 
plate  matching  approach  the  following,  incorrect  but  temporary,  assumption  is  made: 
the  attitude  and  size  of  the  object  are  known.  The  reason  this  assumption  is  in¬ 
correct  and  the  compensation  for  it  being  incorrect  is  addressed  in  Section  4. 3. 1.3. 
Additionally,  as  a  necessary  assumption  for  the  template  matching  process  to  work,  it 
is  assumed  that  the  actual  image  location  of  the  object  is  near  the  currently  rendered 
object  image  location  (p,,  ^  from  an  initial  condition  or  from  a  recent  update). 
As  a  result  of  these  assumptions,  within  a  region  around  p^,  or  with  small  Xcam  and 
Yearn  translational  errors  in  the  estimate,  p^,  the  rendered  object  looks  similar  to  the 
actual  object. 

This  can  be  seen  in  the  sample  images  of  Figure  4.4,  the  top  of  the  image  is  an 
Ic  and  the  bottom  two  images  are  perturbation  1^  images,  along  the  Xcam  axis.  With 
such  noise-free  backgrounds,  both  will  match  with  the  aircraft  in  the  Ic.  However, 
with  the  better  image  location  estimate  (the  on  the  left),  the  rendered  image  comes 
closer  to  resembling  the  actual  object.  Visually,  this  is  distinguishable  in  the  hgure, 
by  the  difference  in  appearance  of  the  two  images.  With  the  closely  resembling 
the  Ic,  the  template  matching  function’s  coordinates  of  the  most-likely  match  (r) 
is  an  estimator  of  and  through  an  approximated,  linear,  trigonometric 
relationship. 

Presented  next  are  the  necessary  components  of  this  relationship,  which  include 
the  pixel  FOV  in  both  the  Xcam  and  Ycam  axis,  the  approximate  and  the  dif¬ 
ference  in  size  between  the  template  image  and  the  image  it  is  matched  against.  Pre¬ 
sented  after  these  three  components  is  an  overview  of  the  template  matching  process 
and  the  complete  equation  demonstrating  the  relationship. 
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Figure  4.4:  An  example  Ic  and  two  perturbation  images.  The  template  matching 
approach  requires  the  initial  estimate  to  be  relatively  close  to  the  true  estimate. 


The  first  necessary  component  is  the  pixel  FOV,  or  instantaneous  FOV  (IFOV), 
of  the  camera  and  lens  in  both  the  Xcam  and  Ycam  axis.  Dividing  the  entire  FOV 
(Equation  2.31)  by  the  number  of  pixels  in  the  array,  determines  the  pixel  IFOV. 


IFOVy  = 
IFOVx  = 
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M 

FOVx 
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(4.1) 

(4.2) 


The  second  item  required,  is  approximated  with  the  -TtcLcam  translation 

of  the  rendered  object, 

Finally,  the  difference  in  size  between  the  template  image  and  the  associated 
matched  image  is  known  because  they  both  are  user  dehned  sizes.  The  method  applied 
for  this  research  was  to  create  a  template  based  on  the  contours  of  the  rendered  object 
in  the  1^.  Because  the  image  is  clean,  with  no  background  or  foreground  noise,  the 
contour  program  Ends  the  contours  of  the  object  only.  The  extreme  contours  in  both 
the  Xcvimage  and  Ycvimage  axes  Create  a  bounding  box  around  the  rendered  object. 
The  bounding  box  defines  a  region  of  interest  (ROI)  in  labeled  the  template  and 
denoted  as  I*.  The  template  is  shown  as  the  rectangle  around  the  aircraft  in  the  left 
side  image  of  Figure  4.5.  The  center  of  It  is  shown  as  a  white  circle  in  the  figure. 

The  center  of  It,  found  in  the  I,,  (left  image  of  Figure  4.5),  but  now  placed  on 
the  Ig,  defines  the  center  of  a  second  bounding  box,  shown  as  the  small  white  square 
in  the  right  image.  This  second  bounding  box  is  labeled  the  match  ROI,  denoted  as 
\rn-  The  Im  IS  showu  as  the  larger  rectangle  around  the  aircraft  in  the  right  image 
of  Figure  4.5.  The  smaller  rectangle  in  the  right  image  has  the  same  center  and  size 
as  It,  simply  copied  onto  the  I^.  From  this  setup,  the  two  aircraft  do  not  occupy  the 
same  position  in  their  respective  images;  the  rendered  aircraft  is  half  a  wing  length 
to  the  left  of  the  actual  aircraft  location. 

The  size  of  the  1^  is  arbitrary.  The  approach  in  this  thesis  increased  all  four 
sides  equally.  The  symbol  A^ox  denotes  an  integer  number  of  pixels  added  twice  to 
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both  dimensions  of  the  I^,  increasing  the  size  of  the  Passing  these  two  images  to 
the  matching  program  creates  an  R  matrix  with  dimensions;  {2Abox+^)  x  {‘2Abox+^)- 


The  template  matching  process,  detailed  in  Section  2. 2. 2. 2,  compares  the  It  to 
the  Im  by  starting  in  the  top  left  corner.  The  It  is  then  slewed,  one  pixel  at  a  time 
until  reaching  the  right  edge  of  the  It  then  returns  to  the  top  left  corner,  one  pixel 
down,  and  repeats.  A  comparison  between  the  two  images  is  made  at  every  possible 
location,  determining  the  location  of  the  most  likely  match. 


Figure  4.5:  Template  (Ij)  and  match  ROI  (Im)  creation.  The  center  of  the  I*  in  the 
I,,,  white  dot,  left  image,  is  placed  in  the  same  coordinates  on  the  Ic  to  determine  the 
center  of  the  Im  in  the  right  image,  a  white  square.  Each  dimension  of  the  Im  is  2Abox 
greater  than  Ij. 


The  result  of  the  template  match  is  R.  When  accessing  its  values,  the  top 
left-most  coordinate  of  R  is  referenced  as  R(0,0).  This  value  in  R  represents  the 
value  of  the  matching  method  (Section  2. 2. 2. 2)  applied  to  the  \t  and  the  top,  left¬ 
most  portion  of  Im-  Similarly,  the  bottom  right-most  position  in  R,  referenced 
as  R(2A6ox+1,  2.Abox+^)  represents  the  value  of  the  matching  method  applied  to 
the  It  and  the  bottom,  right-most  portion  of  Im-  The  center  of  R,  referenced  as 
R(Afeo2.-|-l,Af,oa,-|-l)  represents  the  center  of  the  Im  and  the  original  center  of  the  R. 
Important  relationship:  if  the  most-likely  match  of  the  entire  R  is  at  the  center  po¬ 
sition,  r^atch={,Abox+^Ahox+^)^ ,  the  translations  along  the  X^am  and  Y^am  axes  of 
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the  object  are  the  same  as  the  translations  used  to  create  the  rendered  object,  or 

r^cam  _  rf^GLcam  rf^cam  _  _r^GLcam 

Px,o  Px,r  diiLl  Py  Q  Py,r 

Finally,  combining  the  necessary  components  into  an  approximate  trigonometric 
relationship,  an  estimation  for  and  is  determined: 


•  tan((A,„.  +  1  -  r.)  •  IFOVx)  (4.3) 

V7"  =  p77  -  p7^  •  tan((A,„.  +  !-/',)■  IFOVy)  (4.4) 


where,  is  the  x  coordinate  of  the  most  likely  match  (r)  and  is  the  y  coordinate. 
The  conversion  from  GLcam-frame  (required  to  render  the  image)  to  cam-frame  has 
already  occurred.  To  implement  in  OpenGL: 


=  P^Pr""  +  PzP'"""  ■  tan((A,„,  +  1  -  r,)  ■  IFOVx)  (4.5) 

PIT  =  -PyP""""  +  ■  tan((A,„.  +  1  -  G)  '  IFOVy)  (4.6) 


This  relationship  depends  on  a  key  principle,  the  linear  mapping  of  K.  This 
linear  relationship  is  essential,  because  the  center  of  It  and  1^  are  arbitrarily  chosen. 
The  center  of  the  chosen  template  can  be  anywhere  on  the  rendered  object  or  not  on 
it  at  all.  However,  it  should  overlay  the  same  relative  position  of  the  real  object  in 
the  most-likely  match.  The  position  of  the  template  center  referenced  to  the  6-frame 
of  the  object  remains  constant  during  the  matching.  Because  of  this,  the  linearity 
of  the  K  matrix  allows  the  translation  difference  between  the  center  of  the  template 
and  the  center  of  the  most  likely  match  to  be  the  same  as  if  the  physical  origin  of  the 
object  had  been  matched  in  the  images. 

As  an  example,  it  is  assumed  the  center  of  It  in  the  was  the  wingtip  of  an 
aircraft.  The  position  of  the  wingtip  is  a  hxed  translational  distance  in  all  three  axes 
from  the  dehned  origin  of  the  object  (in  the  6-frame).  When  matching  the  It  to  an 
Im,  the  wingtips  will  match.  The  difference  in  translation  (between  the  It  and  the  !„) 
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in  the  X^am  and  Y^am  axes  of  the  wingtip  is  the  same  as  the  difference  in  translation 
of  the  origin  of  the  object. 

Equations  (4.3  -  4.6)  are  simply  trigonometric  relationships,  relating  the  change 
in  translation  in  pixels  to  a  unit  of  measurement  based  on  the  distance  an  object  is 
from  the  origin  of  the  cam-frame.  The  completion  of  the  example,  from  Figure  4.5, 
is  shown  to  demonstrate. 

Figure  4.6  shows  the  I*  box  overlayed  on  the  most  likely  matching  portion  of 
the  Im  (right  image  of  the  hgure).  The  aircraft  in  the  I,,  needed  to  move  to  the  right 
and  down  slightly  to  match  the  actual  aircraft  in  the  Ic-  The  star  in  the  right  image 
denotes  the  center  of  the  most  likely  ROI  in  that  matches  I*,  the  square  denotes 
the  center  of  the  !„.  Both  of  these  symbols  are  placed  on  their  related  coordinates 
on  the  R  matrix  in  the  bottom  image  of  Figure  4.6.  The  distance  in  pixels  between 
them  (in  both  the  Ic  and  the  R  matrix)  represents  the  angular  difference  between 
and  pr. 

This  relationship  is  an  approximation  because  the  distance  ratios, 

PTP ^  determine  the  accuracy  of  the  calculation.  The  larger  the  ratios,  the  more 
accurate  it  becomes.  This  is  best  seen  in  Figure  4.7  which  continues  the  previous 
example  with  only  a  Xcam  translation  difference  between  the  positions.  The  square 
and  the  star  represent  the  same  change  in  angular  positions.  Because  the  triangle, 
created  with  the  line  of  sight  lines  emanating  from  the  origin  of  the  cam-frame  to 
both  the  square  and  star,  is  oblique,  the  tangent  function  used  in  the  equation  is  an 
approximation.  Cameras  with  a  larger  FOV  that  are  locating  objects  relatively  close 
along  the  Zcam  axis  must  use  an  expanded  trigonometric  relationship. 

As  a  result  of  this  relationship,  the  matching  function  effectively  couples  two 
DOFs,  the  translations  in  the  X^am  and  ^ cam  axes.  One  accomplishes  the  equiva¬ 
lent  of  (Aftoa;-|-l)^  perturbation  images.  The  next  section  revisits  the  assumptions  of 
this  section  and  details  the  effects  of  this  coupling  on  the  attitude  plus  image  location 
coupled  group. 
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(b) 


Figure  4.6:  Results  of  template  matching. 

(a)  The  template  from  1^,  left  image,  is  matched  to  every  possible  position  in  1^, 
right  image. 

(b)  The  results  of  the  matching  values  are  returned  in  the  R  matrix.  The  star  denotes 
the  most  likely  match  position,  tmatch-  The  square  denotes  the  center  of  the  Im  and 
the  center  of  R. 
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Figure  4.7:  Template  match  accuracy.  The  template  match  approximation,  requires  a 
large  ratio  between  and  both  and  to  be  accurate.  The  triangle  created 
between  the  origin  of  the  cam-frame  and  the  square  and  star  is  oblique,  the  tangent 
function  is  a  close  approximation  for  larger  ratios. 
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4- 3. 1.3  Attitude  Plus  Image  Location.  This  section  revokes  the  as¬ 
sumption  of  correct  initial  attitude  of  the  object.  Instead  it  is  assumed  that  the 
rendered  object’s  attitude  is  close  to  the  actual  object’s  attitude.  If  the  attitude  of 
the  predicted  object  is  grossly  inaccurate  in  comparison  to  the  actual  object,  the 
visual  appearance  of  the  I,,  will  most  likely  be  different  from  the  object  in  the  Ic. 

This  does  not  apply  if  the  object  is  highly  symmetric  in  some  manner,  like  a 
sphere,  where  the  attitude  or  portions  of  the  attitude  do  not  influence  the  visual 
appearance.  Assuming  a  non-symmetric  object,  visually  inaccurate  templates  will 
not  correctly  identify  the  image  location  of  the  object  with  template  matching. 

For  small  motions  along  the  Z^am  axis  (the  only  DOF  not  addressed  in  this 
group),  the  relative  size  of  the  object  is  of  less  importance  to  the  visual  appearance  of 
the  object.  It  is  possible  to  get  an  accurate  image  location  in  a  template  match  with 
correct  attitude  information  and  small  errors  in  the  size  of  the  object.  Determining 
the  size  of  an  object  is  presented  in  the  next  section. 

This  attitude  coupling  reduces  the  perturbations  required  to  three  DOFs.  With 
small  attitude  motions  of  the  object  between  Ic  images  and  the  attitude  DOFs  allowed 
to  remain  coupled,  the  accuracy  of  the  attitude  estimation  increases,  iterations  are 
eliminated,  and  the  process  requires  fewer  images  than  coupling  all  six  DOFs.  Two 
perturbations,  plus  the  initial  state  estimate,  for  the  three  DOFs  equates  to  27  1^ 
images  (3^)  compared  to  the  729  required  to  perturb  all  six  DOFs  with  the  same 
number  of  perturbations  in  each  DOF  (or  243  to  perturb  hve  DOFs,  3^). 

The  complete  process  renders  each  possible  combination  of  attitude,  the  esti¬ 
mate  plus  two  perturbations,  for  each  of  the  three  DOFs.  Each  resulting  1^  is  then 
template  matched  against  a  suitable  ROI  of  the  Ic.  The  attitude  values  associated 
with  the  most-likely  match  become  the  attitude  measurement  update.  It  is  also  pos¬ 
sible  to  attain  the  image  location  with  this  most-likely  attitude  match  at  this  point. 
Although,  if  an  update  requires  the  translation  in  the  Zcam  axis,  determination  of 
the  size  of  an  object  should  occur  before  determination  of  the  image  location  of  the 
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object.  This  will  permit  a  more  accurate  image  location.  Hence,  this  decoupled  group 
might  more  appropriately  be  titled  attitude]  however,  all  together  the  process  is  the 
equivalent  of  coupling  hve  DOFs  at  the  rendering  cost  of  coupling  three. 

A  box  diagram  of  this  step  in  the  process  is  shown  in  Figure  4.8  for  two  pertur¬ 
bations  in  each  attitude  DOF.  The  inputs  to  the  system  are  shown  on  the  left  side 
of  the  diagram  and  consist  of  the  initial  estimate  in  orientation  the  initial 

estimate  in  position  (p“”^),  and  the  image  collected  by  the  camera  (Ic).  The  later 
two  are  unchanged  by  this  step  in  the  process.  The  most  likely  attitude  plus  image 
location  results  are  the  6x1  vector  used  to  render  the  {i  G  {1, 2, ...,  27})  image 
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Figure  4.8:  RIPE:  attitude  plus  image  location.  The  orientation  of  the  rendered 
object  in  the  cam-frame  is  annotated  with  three  euler  angles  between  the 

cam-frame  and  the  6-frame  of  the  rendered  object:  a,  jd,  and  y.  The  position  of  the 
rendered  object  in  the  cam-frame  (p™™')  is  annotated  with  three  individual  transla¬ 
tions  in  the  cam-frame:  x,  y,  and  x.  The  contents  of  the  six  DOF  array  below  each 
Ih  {i  G  {1,  2, ...,  27})  are  the  parameters  required  to  create  the  respective  image.  The 
perturbations  are  denoted  as  a  A  for  each  of  the  respective  DOF.  Each  possible  com¬ 
bination  of  attitude  is  rendered  and  a  ROI  of  each  becomes  an  I*  to  match  against  the 
ROI  of  Ic  (Im-)  The  attitude  values  used  to  render  the  most  likely  are  recombined 
to  create  the  attitude  measurement  of  the  object,  {C°am)- 
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that  produced  the  highest  matching  value  with  (such  as  the  correlation  coefficient 
from  Section  2. 2. 2. 2.)  Because  the  translations  in  position  are  the  same  for  all  the 
images,  that  portion  of  the  vector  does  not  add  any  new  information.  The  atti¬ 
tude  measurement  of  the  object,  is  the  unique  output  from  this  step.  As  a 

reference,  rendering  and  comparing  only  the  first  seven  {i  G  {1,2,  ...,7})  images 
without  template  matching  is  an  example  of  a  decoupled  approach. 

Gross  inaccuracy  in  size  does  diminish  the  accuracy  of  this  approach.  Fortu¬ 
nately,  the  inaccurate  size  affects  the  matching  of  all  the  images  equally,  diminishing 
the  overall  effect  of  the  error  as  a  result  of  decoupling  the  single  DOF.  The  next  section 
presents  the  other  decoupled  group. 

4-3. 1.4  Size  Plus  Image  Location.  With  correct  attitude  of  an  object, 
solving  for  the  size  and  image  location  of  the  object  remains.  This  process  simply 
repeats  the  attitude  plus  image  location  process  with  only  translational  perturbations 
along  the  Z^am  axis.  Because  motions  along  this  axis  are  typically  challenging  to 
determine  in  an  image,  decoupling  the  DOFs  in  this  manner  permits  more,  total 
perturbations  about  this  DOF  without  an  exponential  increase  in  the  required  number 
of  1^  images.  The  hnal  step  of  this  process  determines  the  size  and  image  location  of 
the  object  (Po“™)  based  on  the  most-likely  value  of  all  the  templates  matched,  and 
the  position  in  R  of  that  most-likely  match.  The  benefit  of  this  process  effectively 
couples  the  three  translation  DOFs. 

A  box  diagram  of  this  step  in  the  process  is  shown  in  Figure  4.9  for  four  per¬ 
turbations  in  the  "Lcam  axis  DOF.  With  the  better  orientation  estimation  from  the 
previous  step,  a  rendered  image  should  have  a  closer  resemblance  to  the  collected 
image.  The  most  likely  size  plus  image  location  results  are  the  6x1  vector  used  to 
render  the  (i=l:5)  image  that  produced  the  highest  matching  value  with  Ig.  The 
f match  of  the  most  likely  match  are  recombined  to  create  the  position  measurement  of 
the  object,  (p™”^). 
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Figure  4.9:  RIPE:  size  plus  image  location.  The  orientation  of  the  rendered  object 
in  the  cam-frame  is  now  closer  to  the  actual  orientation  of  the  object.  Each 

perturbation  about  the  translation  (Az)  is  rendered  and  a  ROI  of  each  becomes  an 
It  to  match  against  the  ROI  of  Ic  (Im-)  The  2;  value  used  to  render  the  most  likely 
It  and  the  x  and  y  values  attained  through  the  fmatch  of  the  most  likely  match  are 
recombined  to  create  the  position  measurement  of  the  object,  (p^™). 

By  decoupling  the  DOFs  in  this  manner,  Eve  DOFs  are  coupled  at  the  rendering 
cost  of  three,  and  three  DOFs  are  coupled  at  the  rendering  cost  of  one.  Overall,  the 
entire  six  DOF  estimate  is  completed  at  the  cost  of  three  coupled  and  one  uncou¬ 
pled  DOF  with  an  increase  of  precision  in  the  Xcam  and  Ycam  axes.  The  equivalent 
perturbations  about  these  axes  are  much  smaller  than  the  perturbations  of  the  other 
four.  The  effective  size  and  number  of  the  perturbations  in  these  axes  depend  on  the 
IFOV  and  the  size  of  ^hox  respectively. 

An  example  of  a  complete  RIPE  process  diagram  is  shown  in  Figure  4.10  with 
the  incorporation  of  a  Kalman  filter  and  INS  updates  (which  are  detailed  further  in 
Section  4.4.) 
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Figure  4.10:  The  RIPE  process.  The  complete  process  with  incorporation  of  a  Kalman 
hlter  and  INS  updates.  The  process  uses  an  a  priori  estimate  converted  into  the  cam- 
frame  and  incorporates  the  INS  update  to  create  an  initial  estimate  of  orientation  and 
position.  This  initial  estimate  and  is  rendered  and  template  matched  with 

Ic  in  addition  to  perturbations  about  the  initial  estimate.  The  most  likely  attitude, 
size,  and  image  location  of  the  object  is  combined  into  a  measurement  update  (z^) 
for  the  Kalman  hlter  to  determine  the  relative  position  of  the  object. 

Two  other  alternative  template  matching  options  are  presented  as  examples  of 
additional  coupling  possibilities.  Because  of  time  limitations,  they  were  not  further 
explored  for  this  research. 

A  digital  zoom  of  the  I,-  images  (expanding  and  contracting  the  size  of  the 
object)  can  create  a  three-dimensional  template  matching,  with  a  known  relationship 
between  the  zoom  and  perturbations  in  the  'Lcam  axis.  A  scale-invariant  template 
matching  process  would  reduce  the  entire  six  DOF  estimation  to  three  DOFs  while 
increasing  the  precision  in  the  Zcam  axis. 

Second,  the  hrst  rotation  of  the  object  in  the  scene  could  be  a  rotation  about  the 
ZcLcam  axis,  instead  of  the  YGLcam  axis  used  in  this  research.  Digitally  rotating  the 
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template  about  its  center,  and  creating  a  circular  matcli-ROI,  removes  an  additional 
DOF  rendering  cost  and  increases  the  precision  in  that  rotation. 

As  a  last  note  on  the  quick-RIPE  process,  it  also  accounts  for  objects  near  the 
edge  of  the  Ic  by  artificially  clipping  the  four  sides  of  the  by  the  size  difference 
Ahox-  This  partially  limits  the  usefulness  of  the  I^,  but  allows  the  to  always  be 
2Ai,ox  larger  than  the  It  in  both  dimensions.  This  also  allows  the  matching  process 
to  determine  translations  in  both  axes  eqnally,  instead  of  artihcially  limiting  the 
matching  to  directions  toward  the  center  of  the  image.  For  example,  if  the  edge  of 
the  li  was  allowed  to  be  the  edge  of  the  1^,  the  would  be  on  the  edge  of  the  Ic  as 
well  (since  there  is  no  image  beyond  this  point  to  create  a  match-ROI  aronnd)  and 
the  template  matching  could  only  match  the  cnrrent  location  or  translational  motion 
towards  the  center  of  the  Ic. 

After  determining  the  entire  six  DOFs,  the  pose  of  the  object  in  the  cam-frame 
is  complete.  The  next  section  applies  this  process  to  AR,  farther  redncing  the  number 
of  Ir  images  required. 

4-3.2  RIPE  Tailored  to  AR.  The  knowledge  of  a  tanker  aircraft’s  relatively 
benign  and  predictable  motion  during  AR  allows  farther  tailoring  of  the  RIPE  ap¬ 
proach.  The  analyzed  statistical  motions  of  the  aircraft  determine  the  necessary  range, 
direction,  and  magnitnde  of  the  perturbations  about  the  nominal  state.  Decreasing 
the  number  of  perturbations  abont  one  or  more  states  permits  more  perturbations  in 
other  more  dynamic  states  with  the  same  cost  to  rendering,  processing,  and  matching 
time. 

In  AR,  the  states  with  the  most  motion  as  determined  throngh  empirical  data 
are  roll  and  translation  in  the  axis,  as  presented  in  Chapter  3.  Even  when  the 
frame  of  motion  is  rotated  to  the  cam-frame,  the  motions  are  shown  to  be  related. 

The  forward  motion  of  the  aircraft  (a  combination  of  north  and  east  motion, 
depending  on  the  heading  of  the  aircraft)  rotates  to  a  combination  of  motion  in  the 
Ticam  and  Ycam  uxes.  For  this  research,  the  angle  between  the  Zcam  of  the  camera  on 
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the  wing  aircraft  and  the  axis  was  approximately  30°.  At  this  angle,  a  majority 
of  the  change  in  forward  motion  is  rotated  into  the  Zcam  axis  (cos(30°)  =  0.866).  The 
motion  in  this  axis  of  the  camera  is  modeled  the  same  as  the  north  or  east  motion  in 
the  n-frame. 

Fifty  percent  of  the  change  in  forward  motion  is  rotated  into  the  Y^am  axis 
(sin(30°)  =  0.5);  however,  motion  in  both  this  axis  and  Xcam  axis  are  accounted  for 
with  the  template  matching,  as  long  as  Abox  is  large  enough  to  cover  the  potential 
motion  in  these  axes. 

The  roll  motion  of  the  lead  aircraft  is  accounted  for  in  the  cam-frame  through 
the  following  relationship:  Separating  the  rotations  in  this 

manner,  allows  the  known  dynamics  of  to  be  realized  in  C^am-  This  separation 
also  allows  the  inclusion  of  INS  data,  detailed  in  the  next  section.  Since  is 

constant  and  known  and  is  known  from  the  INS,  the  remaining  DCM,  can 

be  modeled  as  presented  in  Chapter  3.  This  relationship  allows  the  lead  aircraft  to 
be  tracked  in  the  cam-frame  using  the  dynamics  demonstrated  in  the  n-frame.  In 
other  words,  the  actual  values  of  and  C\am  'will  b®  different,  but  the  unknown 
variability  of  C\am  will  be  the  same  as  with  the  knowledge  of  the  other  two 
DCMs. 

Modeling  the  lead  aircraft  in  this  manner  tailors  the  quick-RIPE  process.  The 
attitude  plus  image  location  reduces  to  perturbations  in  the  roll  DOF  only.  'With 
this  reduction  to  one  DOF,  increasing  the  number  of  perturbations,  incrementally 
increases  the  number  of  If  images  required.  The  algorithm’s  approach  used  four 
perturbations  about  the  state  estimate,  ±1°  and  ±2°.  The  other  two  rotations  are 
checked  less  often  when  necessary.  The  algorithm  presented  in  this  research  updated 
yaw  every  second  during  straight  and  level  flight  and  every  half  second  during  turns, 
pitch  was  updated  every  two  seconds.  These  were  added  as  an  additional  attitude  plus 
image  location  group  and  not  included  in  the  initial  group  with  the  roll  DOF.  This 
required,  at  some  cost  to  precision,  only  Eve  additional  1^.  instead  of  20  additional  to 
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couple  the  rotations  with  roll  (roll  already  required  hve  1^.,  two  DOFs  coupled  requires 
25  Ir  total) 

The  size  plus  image  location  group  retains  the  translation  about  the  "Lcam  axis 
perturbations.  The  RIPE  algorithm’s  approach  used  four  perturbations  about  the 
state  estimate,  ±5  inches  and  ±10  inches.  This  can  eventually  be  tailored  as  a 
function  of  distance  between  the  camera  and  aircraft.  Perturbing  by  such  a  small 
amount  for  aircraft  farther  away,  might  not  change  the  visual  appearance  of  the 
aircraft.  In  fact,  for  aircraft  farther  away,  the  algorithm  can  be  reduced  to  attitude 
plus  image  location,  with  only  periodic  updates  to  size. 

The  DOFs  used  and  their  perturbation  amounts  partially  account  for  the  po¬ 
tential  motions  of  the  wing  aircraft,  which  can  vary  depending  on  the  platform  and 
pilot.  Withont  other  sonrces  of  data,  proof  of  this  process  and  the  valnes  chosen  is 
limited  to  the  system  it  was  designed  for. 

Throngh  this  approach,  the  entire  quick-RIPE  process  reqnires  ten  images 
per  single  Ic  with  extras  reqnired  periodically  and  at  an  increased  rate  dnring  turns. 
Thns,  the  process  is  reduced  to  attitude,  size,  and  location.  The  hnal  portion  of  this 
chapter,  details  the  implementation  of  this  process  with  a  Kalman  hlter. 

4.4  Integrated  RIPE 

Applying  the  RIPE  process  to  tracking  a  tanker  can  include  an  interaction 
with  a  Kalman  hlter  designed  to  track  the  aircraft.  The  Kalman  hlter  dynamics  in 
Chapter  3,  modihed  appropriately  for  tracking  the  lead  aircraft  in  the  cam-frame, 
uses  the  output  of  the  modihed-RIPE  algorithm,  three  translational  DOF,  and  the 
roll  DOF,  as  measnrement  updates,  z.  The  remaining  items  of  the  hlter,  not  already 
presented,  consist  of  INS  attitude  updates  and  the  noise  of  the  measnrement,  R. 

4.4- i  INS  Update.  The  INS  interaction  with  the  hlter  included  attitude 
updates  only.  Other  useful  information  is  available  from  the  INS;  however,  additional 
mechanization  in  the  hlter  is  necessary  to  incorporate  it.  To  ensure  the  statistical 
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continuity  of  the  Kalman  filter,  the  INS  does  not  modify  the  state  information  directly, 
rather  it  uses  the  a  priori  state  of  the  hlter,  to  create  the  initial  estimate  for  the 
RIPE  process.  The  state  is  updated  to  after  the  RIPE  measurement  and  does 
not  directly  include  the  updated  INS  information.  This  effectively  separates  the  INS 
update  from  the  statistical  accuracy  of  the  hlter,  while  using  the  information  as  a 
better  initial  estimate  for  the  RIPE  process. 

The  INS  information  updates  the  estimate  by  separating  i'^to 
The  INS  information  updates  the  rotations.  The  other  values  and 
are  kept  constant,  and  then  recombined  to  create  the  initial  attitude  estimates  for  the 
next  RIPE  measurement.  Other  state  information  from  x^  is  used  to  create  for 
the  initial  translational  estimate  in  the  RIPE  process. 

Incorporation  of  the  INS  attitude  information  accounts  for  some  of  the  attitude 
motion  of  the  wing  aircraft  in  the  tracking  process.  Without  the  inclusion  of  the 
INS  information,  the  movement  of  the  tanker  aircraft  in  the  cam-frame  would  not 
as  closely  resemble  the  predicted  motion  from  the  analysis  as  in  the  n-frame.  If  INS 
information  were  not  available,  increasing  the  noise  dynamic  model  of  the  hlter  (Q) 
could  partially  account  for  this  unpredicted  motion  of  the  lead  aircraft. 

4-4-2  Kalman  Filter  Measurement  Noise.  The  last  remaining  requirement 
of  the  Kalman  hlter  is  the  noise  of  the  measurement  (R).  The  value  of  R  depends 
on  the  independent  accuracy  of  the  RIPE  algorithm.  To  determine  the  accuracy  of 
RIPE,  a  sample  run  of  AR  collected  images  was  processed  by  RIPE  with  no  hltering, 
producing  relative  position  estimates.  Without  a  hlter,  the  estimate  of  position  and 
attitude  from  a  previous  time  epoch  became  the  initial  six  DOF  estimates  for  the  next 
time  epoch.  Yaw  and  pitch  were  estimated  every  two  and  three  seeonds  respectively. 
The  data  had  minimal  noise  in  the  image  and  minimal  extraneous  movements  of  both 
aircraft. 

The  results  of  the  sample  run  are  shown  in  Figure  4.11.  The  run  lasted  77.6 
seeonds,  and  accomplished  776  measurement  updates  at  ranges  from  -1500  to  -740 
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inches  in  the  axis.  As  a  reference,  the  pre-contact  position  is  approximately 
-1150  inches  and  the  contact  position  is  in  the  range  of  -830  to  -690  inches.  The 
wing  aircraft  maintained  a  near  constant  axis  translation  at  approximately  13 
feet.  The  run  was  accomplished  at  30°  aspect  angle,  typical  for  AR.  Portions  of  the 
aircraft  were  occluded  from  the  camera  FOV  from  approximately  46  seeonds  to  51 
seeonds.  There  was  a  loss  of  truth  data  at  10  seeonds,  denoted  with  a  spike  in  the 
graphs.  The  data  were  rotated  into  the  fe^-frame  for  determination  of  the  AR  positions 
presented  in  Chapter  3.  The  analysis  of  the  data  is  shown  in  Table  4.1. 

4.4-3  RIPE  Errors.  First  the  errors  in  and  both  decrease  with 
decreasing  range  between  the  aircraft.  The  errors  in  X;,^  are  approximately  3-4%  of 
the  range,  24  inches  in  error  at  700  inch  range  (two  feet  error  in  contact  position),  and 
50  inches  in  error  at  1500  ineh  range.  The  Yf,^  can  be  seen  to  have  the  same  trend, 
if  the  removal  of  the  bias  placed  the  errors  on  the  positive  side  of  the  error  scale  (the 
Yfe^  error  would  start  with  a  higher  positive  error,  reducing  as  the  aircraft  closed). 
This  is  a  function  of  resolvable  distance  at  further  distances  where  the  blurring  of 
the  aircraft  in  the  Ic  accounts  for  a  larger  percentage  of  pixels  of  the  aircraft.  This 
blurring  area  of  pixels  makes  it  harder  to  make  a  precise  match  compared  to  closer 
distances.  This  is  also  witnessed  in  the  noisiness  in  error  at  those  further  distances. 
The  error  in  Yf,^  is  smoother  because  motion  in  this  axis  was  minimal. 

Second,  the  error  in  roll  has  a  dehnite  discrete  aspect  to  it.  This  is  attributed  to 
the  discrete  perturbation  allowed  in  roll  (±1°  and  ±2°).  This  error  can  be  decreased 
with  smaller  perturbation  values.  Further  discussion  on  the  viability  of  the  RIPE 
approach  will  be  discussed  in  Chapter  5. 

From  the  error  analysis,  only  errors  in  roll  met  the  Kalman  hlter  requirement 
of  zero  mean.  The  origins  of  the  biases  in  the  errors  presented,  and  the  errors  to  be 
presented  in  Chapter  5,  were  not  fully  determined.  The  standard  deviation  of  the 
accuracy  of  the  truth  data  was  18  inehes,  the  biases  shown  are,  at  least  partially, 
truth  data  biases.  Visually,  the  Ic  and  are  similar,  as  shown  in  Figure  4.12.  The 
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Flight  35^  (Run13)  with  no  Kalman  Filter,  frame 


Figure  4.11:  Accuracy  of  RIPE  algorithm.  A  test  run  of  the  RIPE  algorithm,  with 
no  hltering  of  the  measurements,  a  statistical  analysis  of  the  performance,  shown  in 
Table  4.1,  determines  the  values  of  R  in  the  Kalman  filter.  The  top  chart  is  the 
position  of  the  wing  aircraft  in  the  axis,  as  the  solid  and  dotted  lines  move 
towards  the  top  of  the  chart,  (time  =  45  through  50  seconds),  the  wing  aircraft  is 
closer  to  lead,  similar  for  the  third  chart,  the  position  of  the  receiver  in  the  Z;,^  axis. 
The  second  chart  is  the  wing  aircraft  in  the  axis,  or  translations  left  and  right. 
The  final  plot  is  the  roll  position  of  the  lead  aircraft. 
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DOF 

Max  (-)  Error 
{inches) 

Max  (-I-)  Error 
{inches) 

Mean  Error 
{inches) 

Standard  Deviation 
{inches) 

axis 

6.0 

42.0 

21.0 

7.0 

axis 

-16.0 

-5.0 

-10.0 

2.5 

axis 

-15.0 

3.0 

-3.5 

3.0 

Roll 

-2.0 

0.5 

0.0 

0.5 

Table  4.1:  Data  analysis  of  RIPE  measurement  error.  Data  was  rounded  to  the 
nearest  half  inch. 


standard  deviations  shown  in  Table  4.1,  squared,  determined  the  appropriate  values  in 
R.  From  experimental  experience  the  best  match  possible,  i'coe//_normed,  was  approx¬ 
imately  0.85  (unit-less)  and  the  worst  value  that  appeared  (visually)  to  still  match 
was  approximately  0.55,  a  sliding  scale  between  these  values  increased  the  value  of 
R.  This  accounted  for  measurements  with  increased  visual  noise. 

This  concludes  the  chapter  on  pose,  background  information  on  AAR  including 
pose  applied  to  AAR,  and  hnally  this  thesis’  approach  to  AAR  using  RIPE.  The 
next  chapter  presents  the  application  of  RIPE  to  real-world  collected  data  and  error 
analysis. 
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Figure  4.12:  The  RIPE  algorithm  tested  with  no  hltering.  The  contour  of  the  R  is 
shown  overlayed  on  the  Ic  at  around  48  seconds  in  the  data  run  shown  in  Figure  4.11. 
The  images  are  close  enough  for  a  measurement  update,  but  not  perfect. 
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V.  Experimental  Results 


Relative  navigation  using  images  for  the  purposes  of  air  refueling  an  aircraft  au¬ 
tonomously,  involved  every  aspect  of  the  previous  four  chapters.  The  algorithm 
as  presented  in  Chapter  4  estimated  the  position  of  a  lead  aircraft  in  the  camera’s  held 
of  view  enabling  the  accurate  determination  of  a  relative  position  of  the  wing  aircraft 
for  AAR.  With  a  Kalman  hlter  and  INS  updates,  the  research  algorithm  successfully 
met  three  of  the  four  goals  of  this  research;  it  increased  the  accuracy  of  the  RIPE 
approach  with  open  source  libraries  and  did  not  require  modihcation  to  the  lead  air¬ 
craft.  However,  the  time  required  to  implement  the  solution  exceeds  what  the  author 
believes  is  acceptable  for  an  autopilot  response.  This  chapter  covers  the  creation  of 
the  models,  laboratory  and  held  work,  experimental  data  collection,  estimation  errors, 
and  analysis  of  the  process. 

The  hrst  step  in  both  experiments  involved  creating  a  model  of  the  intended 
target.  While  it  is  possible  to  apply  the  RIPE  process  without  an  a  priori  model,  an 
attempt  was  made  to  ensure  that  the  errors  in  this  approach  to  pose  were  not  from 
modeling.  The  hrst  section  covers  the  creation  of  models  for  both  the  laboratory  and 
held  work. 

The  laboratory  work  consisted  of  a  basic  box-shape  aircraft  in  a  Vicon®  envi¬ 
ronment.  The  work  was  a  risk  management  step  to  ensure  a  rendered  image  approach 
to  pose  was  possible  and  ultimately  the  work  led  to  the  development  of  the  quick- 
RIPE  algorithm.  The  Vicon®  system  provided  both  a  very  controlled  environment 
and  a  precise  truth  collection  process.  Images  of  the  box-aircraft  were  collected  si¬ 
multaneously  with  its  true  position  and  the  true  position  of  a  camera  relative  to  a 
locally  dehned  Vicon®  navigation  frame.  While  collecting  the  images,  the  camera 
was  randomly  moved  at  diherent  angles  and  distances  to  the  box-aircraft.  The  work 
evaluated  the  RIPE  method  applied  to  two  independently  moving  objects  without 
the  use  of  INS  data  or  a  Kalman  hlter  and  with  partially  occlusion  of  the  box-aircraft 
(up  to  half  of  the  object  out  of  the  FOV)  in  the  images.  Because  of  background  and 
foreground  noise,  the  early  version  RIPE  algorithm  used  contour  images  of  both  the 
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Ir  and  Ic  for  template  matching.  Lessons  learned  in  the  lab  improved  the  algorithm 
for  the  held  work. 

The  held  work  mimicked  the  laboratory  work  in  a  real-world  stochastic,  non- 
deterministic  environment.  This  work  focused  on  the  incorporation  of  dihering  data 
sources,  INS  data,  and  a  Kalman  hlter  into  the  navigation  solution.  This  portion  of 
the  process  entailed  hights  with  a  TPS  T-38A  Talon  aircraft  as  a  simulated  tanker 
and  a  Calspan  LJ-25  Learjet  as  a  receiver.  The  aircraft  hew  maneuvers  representing 
those  hown  by  actual  tanker  and  receiver  combinations  in  the  operational  realm.  The 
aircraft  were  hown  by  experienced  pilots  enrolled  in  TPS.  A  camera  mounted  on 
the  dash  of  the  LJ-25  captured  images  of  the  the  T-38A  while  hying  representative 
refueling  maneuvers.  Multiple  truth  collection  sources  provided  an  initial  position 
for  the  RIPE  algorithm  and  a  validation  of  the  process  and  its  accuracy.  The  RIPE 
algorithm  was  then  executed  using  the  recorded  data  after  landing.  The  algorithm 
was  causal,  but  slowed  to  allow  processing. 

5. 1  Model  Creation 

The  models  used  in  this  research  are  simply  a  collection  of  coordinates  for  each 
individual  polygon  that  make  up  the  entire  model.  When  a  program  renders  the 
model,  the  OpenGL  library  produces  a  visual  representation  on  screen,  as  detailed 
in  Section  2.2.1.  In  modeling  the  aircraft  for  the  predictive  rendering  portion  of  the 
algorithm,  absolute  precision  is  not  a  requirement.  However,  accuracy  and  scale  are 
important  [18]. 

Accuracy  is  necessary  to  provide  quantitative  comparisons  between  the  collected 
and  rendered  images  because  incorrectly  placed  items  on  the  aircraft  will  negatively 
ahect  the  solution.  The  approach  presented  in  this  research  depends  on  determining 
which  rendered  image  out  of  a  certain  number  most  likely  resembles  the  collected 
image.  If  an  engine  pod  is  modeled  at  the  wrong  position  on  the  aircraft,  images 
of  this  incorrect  model,  rendered  at  incorrect  locations  relative  to  the  camera  may 
appear  more  likely  to  the  algorithm  than  those  rendered  at  the  correct  location. 
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Scale  is  very  important  when  determining  the  distance  between  aircraft.  Using 
an  aircraft  modeled  with  a  shorter  wingspan  than  the  actual  aircraft  will  result  in 
a  shorter  estimate  than  the  actual  solution.  Incorrect  scaling  of  other  parts  of  the 
aircraft  will  affect  the  solution  similarly.  As  shown  in  [18],  an  error  of  20%  in  the 
model  resulted  in  a  four-fold  increase  in  the  tracking  error. 

Creating  a  model  from  engineering  diagrams  is  not  always  possible.  Assuming 
such  diagrams  exist,  they  would  not  necessarily  include  modihcations,  alterations,  or 
even  paint  schemes.  Completing  the  model  requires  determining  and  including  these 
items.  Some  of  the  options  to  create  three  dimensional  models  of  aircraft  include  laser 
scanning,  photogrammetry,  and  collecting  point  measurements. 

Laser  scanning  is  very  accurate  but  currently  requires  expensive  equipment  and 
training.  As  the  costs  continue  to  drop  and  equipment  evolves  to  become  more  user- 
friendly,  this  option  may  become  viable  in  the  future.  Models  created  using  laser 
scanning  still  need  textures  or  images  added  to  produce  a  rendered  image  represen¬ 
tative  of  the  actual  aircraft. 

At  the  cost  of  some  of  the  precision  of  laser  scanning,  advances  in  photogram¬ 
metry  software  allow  very  accurate  three-dimensional  modeling  with  the  use  of  images 
from  a  known  calibrated  camera.  Photogrammetry  utilizes  the  science  of  multi-view 
perspective  geometry.  This  science  is  similar  to  triangulating  a  navigation  position 
based  on  distances  from  three  or  more  known  reference  points.  Photogrammetry 
requires  combining  several  images  of  an  object,  each  with  an  image  location  of  a 
characteristic  feature  or  dehned  point  of  the  object.  Enough  features  identihed  in 
multiple  images  determines  the  relative  positions  of  all  the  characteristic  points  in 
three-dimensional  space.  Combining  the  relative  position  of  the  characteristic  points 
creates  a  three-dimensional  model.  Correct  scaling  of  the  entire  model  only  requires 
the  addition  of  a  single  verified  distance  between  any  two  points  on  the  object  being 
modeled. 
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This  software  imports  the  images  of  the  object  to  be  modeled  and  the  user  ref¬ 
erences  characteristic  points  in  as  many  images  as  possible.  When  referenced  across 
a  few  images,  the  software  automatically  computes  epipolar  lines  drawn  on  unrefer¬ 
enced  images.  These  lines  help  the  user  visually  see  where  the  point  should  be  in  the 
image  and  determine  how  close  the  points  match  up  with  the  model  the  software  is 
creating. 

A  beneht  of  photogrammetry  is  the  automatic  inclusion  of  images  with  the 
model  creation.  Multiple  images  texture  mapped  to  the  object  creates  a  more  realistic 
model. 

The  laboratory  work  of  this  research  utilized  PhotoModeler®,  a  commercial 
software  suite,  to  build  a  basic  three-dimensional  model  of  a  simple  wooden  airplane. 
Such  a  process  was  not  necessary  for  such  a  basic  shape,  but  was  a  proof  of  concept 
for  potential  use  on  an  actual  aircraft.  An  overview  of  the  PhotoModeler®  process  is 
shown  in  Figure  5.1. 

A  problem  encountered  with  the  use  of  photogrammetry  is  the  difficulty  in 
locating  specihc  points  to  reference  between  photos,  a  problem  of  correspondence. 
Various  textures,  dots,  or  symbols,  applied  to  an  object,  can  make  the  process  easier, 
and  in  some  cases  automated.  Covering  an  entire  aircraft  with  these  textures  is 
difficult  and  time  consuming. 

Because  of  the  limited  downtime  in  the  aircraft’s  flight  schedule,  the  held  work 
modeling  effort  attempted  to  use  PhotoModeler®  without  the  use  of  the  dots  or 
symbols.  Unfortunately,  the  lack  of  distinguishable  features  on  the  aircraft  (almost 
completely  white)  made  the  process  impossible  without  considerable  effort  to  locate 
individual  rivets  and  joints  in  multiple  portions  of  the  aircraft.  This  method  was 
abandoned  because  of  the  failure  to  place  characteristic  points  to  reference  on  the 
aircraft. 

Eventually,  two  other  modeling  methods  created  two  different  types  of  models, 
one  of  which  was  used  for  the  RIPE  algorithm.  Both  models  began  with  a  wire  frame 
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(c)  (d) 


Figure  5.1:  Creating  a  3-D  model  with  PhotoModeler®. 

(a)  To  minimize  estimation  errors  associated  with  an  object,  a  simple  box  frame 
aircraft  was  constructed  out  of  spruce  plywood. 

(b)  Pictures  of  the  box-aircraft  were  taken  from  various  angles  and  distinguishable 
features  were  cross-referenced  in  PhotoModeler®.  Including,  but  not  shown,  pictures 
taken  from  underneath. 

(c)  After  referencing  the  same  feature  in  multiple  images,  PhotoModeler®  presents 
the  user  with  epi-polar  estimation-lines  for  reference-point  determinations. 

(d)  After  exporting  the  model  as  a  collection  of  polygons,  the  aircraft  can  be  rendered 
using  the  OpenGL  library. 
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diagram  built  with  the  original  specihcations  from  Northrop  Grumman,  shown  in 
Figure  5.2.  Over  its  lifespan,  the  aircraft  was  altered  with  one  particular  alteration 
visible  in  some  of  the  rendered  images.  A  flat  disk  around  the  top  of  the  vertical 
stabilizer  at  the  rear  of  the  aircraft  was  not  in  the  original  design.  This  could  possibly 
contribute  to  some  errors  in  roll  when  highlighted  by  the  sun  during  flight.  The  error 
in  the  model  was  not  noticed  initially  and  was  never  corrected. 


Figure  5.2:  Initial  wire  frame  model.  This  was  the  beginning  model  used  in  the 
research,  created  using  the  original  aircraft  specihcations  from  Northrop  Grumman. 

A  model  created  and  used  for  this  research  projected  and  then  attached  an 
underside  photo  of  the  aircraft  to  the  bottom  surface  of  the  wireframe  model.  This 
only  necessitated  a  single  photo  collected  during  hight.  The  photo  was  applied  in 
individual  sections  so  it  did  not  appear  as  a  hat  image  on  the  bottom  of  the  aircraft. 
This  is  important  during  the  rendering  process  because  hat  surfaces  will  not  change 
appearance  in  the  same  manner  as  contoured  surfaces.  By  maintaining  the  shape  of 
the  wire  frame  model  with  the  photo  texture,  realistic  lighting  and  shading  can  appear 
on  the  aircraft.  This  model,  with  a  close  up  view  from  underneath  the  aircraft,  is 
shown  in  Figure  5.4  (a)  and  (b). 

There  are  a  few  caveats  with  this  approach.  First,  and  probably  most  impor¬ 
tantly,  the  process  requires  a  picture  of  the  bottom  side  of  the  aircraft.  Unless  a  hoist 
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Figure  5.3:  Texture  image.  This  image,  taken  in  flight,  was  projected  on  and  applied 
to  the  underside  of  the  wireframe  model  of  the  same  aircraft  (shown  in  Figure  5.2)  to 
create  the  model  shown  in  Figure  5.4  (a)  and  (b). 
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(a)  (b) 


Figure  5.4:  3-D  model  with  texture  mapping. 

(a)  A  photo  taken  of  the  bottom  of  the  aircraft  in  flight  was  applied  to  the  wire  frame 
model  shown  in  Figure  5.2. 

(b)  A  close  up  view  of  the  texture-mapped  model,  showing  the  changes  in  contour  of 
the  aircraft. 
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or  crane  can  lift  the  aircraft  (with  the  gear  retracted)  or  stitching  together  multi¬ 
ple  individual-pictures  from  underneath  the  aircraft  is  possible,  the  process  involves 
collecting  the  image(s)  in  flight.  Even  with  photos  taken  on  the  ground,  an  aircraft 
looks  different  on  the  ground  (wings  bend  down)  than  it  does  in  flight  (wings  bend  up) 
due  to  aerodynamic  effects,  this  was  minimized  in  this  research  but  still  noticeable. 
In-flight  photography  is  expensive  and  it’s  difficult  to  determine  the  distortion  effects 
of  the  windscreen  that  the  photographer  will  use.  As  a  solution,  the  same  camera 
used  in  the  research  with  a  known  distortion  model  also  collected  the  image  for  the 
texture  mapping. 

Second,  attaining  a  good  perspective  image  of  the  aircraft  is  difficult.  In  Fig¬ 
ure  5.4  (a),  the  image  appears  smeared  near  the  nose  of  the  aircraft.  This  smearing 
occurs,  because  it  would  be  difficult  to  fly  directly  below  the  aircraft  and  take  a  pic¬ 
ture  from  underneath;  therefore,  the  image  was  taken  at  an  angle  from  behind.  The 
rotated  model  shows  the  projective  distortion  in  the  photo  applied  to  the  bottom 
surface.  In  Figure  5.4  (b)  the  perspective  of  the  aircraft  is  similar  to  the  perspective 
when  it  was  taken,  reducing  the  effect  of  the  distortion. 

Another  modeling  technique  facilitated  the  creation  of  a  backup  model.  For 
this  model,  the  entire  wire  frame  was  colored  white  and  specihc  visual-textures  were 
applied  to  the  aircraft  and  colored  black.  For  accurate  truth  collection  in  the  research, 
it  was  necessary  to  know  the  location  of  the  truth-data  and  image  collection  equip¬ 
ment  by  boresighting  their  location  and  orientation.  The  Faro®  arm  equipment  used 
to  boresight  those  devices  also  precisely  mapped  some  of  the  paint  schemes  of  the 
aircraft.  Applying  the  determined  positions  of  the  paint  schemes  to  the  model  and 
coloring  them  appropriately  created  a  more  accurate  textured-mapped  model  than 
other  techniques.  The  completed  model  is  shown  in  Figure  5.5. 

This  process  also  has  problems.  First,  it’s  expensive;  the  equipment,  technician, 
and  the  aircraft  down-time  is  almost  as  costly  as  flying.  Second,  the  paint  scheme 
locations  were  precise  but  not  absolute  and  minor  errors  in  their  locations  were  dis- 
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Figure  5.5:  A  model  created  with  precise  paint  scheme  points.  This  model  was  not 
ultimately  used  for  this  research,  but  is  a  better  representation  of  the  aircraft  for 
future  research. 
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covered.  This  presented  issues  when  applying  the  color  to  the  skin  of  the  aircraft. 
If  a  coordinate  is  incorrect  towards  the  inside  of  the  skin,  the  color  is  not  visible 
when  rendered  (the  white  skin  of  the  wire  model  covers  it).  Similarly,  if  a  coordinate 
is  incorrect  to  the  outside  of  the  skin,  the  color  appears  to  float,  unattached  to  the 
body  of  the  aircraft.  To  correct  this,  many  of  the  points  had  to  be  translated  along 
a  line  normal  to  the  surface  of  the  aircraft,  so  they  could  be  on  the  skin  and  seen  in 
renderings. 

Finally,  the  natural  curves  of  the  aircraft  makes  this  process  challenging.  Be¬ 
cause  of  the  sheer  number  of  points  required,  it’s  difficult  and  time  consuming  to 
collect  every  point,  of  every  aspect  of  the  paint  scheme.  Because  of  this,  the  Faro® 
arm  collected  only  well  dehned  features.  Instead  of  collecting  the  entire  left  side  of 
the  “U”  at  the  front  of  the  aircraft,  the  Faro®  arm  only  determined  the  corners  that 
dehne  the  straight  lines  of  the  “U”.  Unfortunately,  the  “U”  is  not  straight  at  all,  as 
it  curves  along  the  side  of  the  aircraft.  When  inputting  these  two  points  onto  the 
model,  the  straight  line  between  them  would  pass  through  the  skin  of  the  aircraft. 
These  lines  were  hand-curved  to  the  side  of  the  aircraft  as  well  as  translated  to  the 
surface. 

No  attempt  was  made  to  determine  if  the  quality  or  type  of  model  affected  the 
results  of  the  navigational  approach  presented  in  this  thesis.  With  a  model  created, 
the  next  step  involved  preliminary  evaluation  of  the  process  in  a  laboratory  setting, 
detailed  in  the  next  section. 

5.2  Laboratory  Work 

The  laboratory  work  made  use  of  the  Vicon®  motion  capture  system  to  provide 
the  true  location  of  the  simulated  aircraft.  The  Vicon®  system  has  an  advertised  ac¬ 
curacy  of  approximately  1  mm.  Images  were  collected  using  a  Prosilica®  GC  1290C 
camera  with  an  8  mm  lens  (simulating  a  wing  aircraft).  With  this  setup,  represen¬ 
tative  refueling  motions  were  hand  flown,  by  moving  the  camera  only,  at  distances 
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ranging  from  60  to  160  inches.  Images  of  the  box  aircraft  were  collected  at  10  Hz 
dnring  the  simulated  flight. 

The  model  was  constructed  out  of  spruce  plywood  and  was  24  inches  wide,  32 
inches  long,  and  16  inches  tall  (shown  in  Figure  5.6(b)  and  Figure  5.1).  To  allow 
the  camera  to  collect  images  below  and  aft  of  the  box-aircraft,  it  was  suspended  with 
hshing  wire  to  minimize  the  visibility  of  the  suspension  wires  in  the  collected  images. 
The  process  discussed  in  Section  5.1  details  the  creation  of  the  virtual  representation 
of  the  model. 

The  camera  had  a  4.8  mm  by  3.6  mm  sized  sensor  with  1280  by  960  pixel 
resolution.  The  camera  and  lens  combination  provided  a  FOVx  of  33.4°  and  a  FOVy 
of  25.4°  for  an  aspect  ratio  of  1.317.  Images  were  collected  of  a  standard  checkerboard 
and  a  distortion  model  was  constructed  using  the  theory  demonstrated  in  [9]  and  the 
Camera  Calibration  Toolbox  for  Matlab®  [2]. 

The  use  of  the  Vicon®  system  was  invaluable  for  this  testing.  Small  tracking  de¬ 
vices  (reflective  balls),  placed  on  the  box-aircraft  and  the  camera,  allowed  the  system 
to  collect  accurate  position  and  orientation  of  both.  The  system’s  cameras  projected 
and  then  detected  the  reflection  of  infrared  light  off  the  reflective  balls  on  the  objects, 
triangulating  their  position.  Figure  5.6  shows  pictures  of  the  equipment  used  with 
the  reflective  balls  attached  and  the  Vicon®  environment. 

Collecting  the  Vicon®  data  simultaneously  with  the  images  created  a  time- 
stamped  data  source  for  each  collected  image.  A  single  laptop  computer  with  two 
ethernet  connections  collected  both  the  Vicon®  data  and  the  images. 

For  various  reasons,  mainly  because  of  visual  noise  in  the  scene,  the  images  were 
preprocessed  prior  to  matching.  A  contour  function,  applied  to  both  the  collected  and 
rendered  images,  created  a  contour  representation  before  initiation  of  the  template 
matching.  This  process  was  described  in  Section  2. 2. 2.1. 

The  simulation  simplihed  a  few  aspects  of  AAR.  No  &^-frame  was  dehned,  only 
a  cam-frame.  The  origin  of  the  Vicon®  system  simulated  the  e-frame  for  relative 
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Figure  5.6:  Vicon®-tracked  objects.  The  cameras  of  the  system  detect  infrared  light 
reflecting  off  of  the  balls  on  the  objects. 

(a)  The  camera  did  not  meet  the  minimum  size  requirement  for  the  Vicon®  system 
to  track  its  orientation,  requiring  the  addition  of  an  extension. 

(b)  The  aircraft  was  hung  with  hshing  wire  that  was  barely  visible  in  the  Ic  images. 

(c)  The  Vicon®  cameras  surround  the  tracked  objects  for  optimum  triangulation. 
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position  error  determination  and  also  the  n-frame  for  attitude  information  and  error, 
which  minimized  unnecessary  translations  and  rotations.  The  system  as  dehned  and 
used  is  shown  in  Figure  5.7. 

Collection  of  data  simulated  the  receiver  at  various  positions  behind  and  below 
the  box-aircraft  and  accomplishing  various  random  motions.  These  motions  did  not 
necessarily  represent  true  AR  motions,  additionally,  the  box-aircraft  had  a  natural 
sway  motion  from  a  single  attachment  point  for  the  fishing  wire.  Because  of  this 
natural  sway  of  the  aircraft,  the  RIPE  process  included  both  roll  and  yaw  in  every 
matching  update.  The  algorithm  periodically  updated  the  pitch  of  the  box,  but  the 
suspension  of  the  box  limited  the  pitch  motion  considerably.  An  example  result  of 
the  algorithm’s  matching  is  shown  in  Figure  5.8,  with  the  contour  overlay  of  the  1^ 
on  the  Ic- 

Post-processed  data  runs  were  tracked  by  the  RIPE  process.  An  initial  value 
was  given  to  the  algorithm  from  the  truth  source.  The  algorithm  did  not  make  use  of  a 
filter,  the  initial  estimate  of  a  measurement  update  was  simply  the  hnal  measurement 
from  the  previous  update  determined  by  the  RIPE  process.  The  error  associated  with 
this  process  without  INS  and  without  a  Kalman  hlter  is  shown  in  Figure  5.9. 

The  analysis  of  the  laboratory  data  is  shown  in  Table  5.1.  It  was  assumed  that 
the  addition  of  an  INS  and  filter  should  only  increase  the  accuracy  of  the  system. 
They  were  excluded  from  the  laboratory  work  because  of  time  limitations. 

For  this  single  run  of  520  frames  or  measurement  updates  at  ranges  from  80  to 
110  inches,  the  algorithm  performed  well  with  an  error  of  approximately  ±1%,  as  a 
function  of  the  range  between  aircraft,  in  all  three  axes.  There  is  an  obvious  delay 
in  the  Z  axis  of  the  cam-frame  (seen  in  the  top  of  Figure  5.9)  which  is  attributed  to 
the  small  range  of  perturbations  allowed  to  that  axis.  The  algorithm  could  simply 
not  adjust  the  translation  in  that  axis  fast  enough  to  match  the  actual  motion  of  the 
target,  it  used  the  maximum  perturbation  allowed  in  each  frame.  The  errors  in  the 
Y  axis  and  X  axis  are  attributed  to  the  contouring  process.  The  contour  function 
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Figure  5.7:  Laboratory  work  navigation  setup.  The  laboratory  work  environment 
with  the  simplihed  AAR-associated  reference-frames. 


Figure  5.8:  The  laboratory  work  example  match.  The  contour  overlay  of  the  1^  is 
shown  on  the  \c- 
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Figure  5.9:  The  laboratory  work  error.  The  work  in  the  lab  demonstrated  the  viability 
of  the  RIPE  approach.  This  error  is  shown  in  the  cam-frame  because  there  was  no 
dehned  nor  dehned  AR  positions  in  this  setup. 


DOF 

Max  (-)  Error 
{inches) 

Max  (-I-)  Error 
{inches) 

Mean  Error 
{inches) 

Standard  Deviation 
{inches) 

-2.0 

3.5 

1.0 

1.0 

-^cam  SjXis 

-5.0 

0.5 

-1.5 

1.0 

Yearn  axis 

-3.0 

3.5 

0.5 

1.0 

Table  5.1:  Data  analysis  of  RIPE  laboratory  error.  Data  was  rounded  to  the  nearest 
half  inch. 
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as  implemented  found  the  interior  and  exterior  contours  of  the  box- aircraft.  When 
this  contour  was  displayed,  the  exterior  edges  of  the  box-aircraft  had  two  lines,  for 
both  the  interior  and  exterior  contours  found.  To  compensate,  the  line-width  used 
to  draw  the  contours  was  increased  (creating  a  single  line),  blurring  the  true  edge  of 
the  box-aircraft  slightly.  This  can  be  seen  in  Figure  5.8;  the  contour  lines  are  thicker 
than  the  actual  edge  of  the  box- aircraft.  As  a  result,  the  pure  black  and  white  (with 
no  grey)  contour  and  images  had  more  of  a  line-width  to  match,  decreasing  the 
precision. 

Finally,  during  the  data  run  shown  here,  the  box- aircraft  temporarily  and  only 
partially  exited  the  held  of  view  around  frame  number  365  (witnessed  in  Figure  5.9  as 
a  large  change  in  the  translation  in  the  Xcam  axis.)  Figure  5.10  shows  an  image  of  the 
farthest  occlusion  of  the  box-aircraft.  While  the  box-aircraft  was  only  at  this  extreme 
position  for  a  few  frames,  the  algorithm  was  able  to  consistently  track  it  throughout. 

The  laboratory  work  demonstrated  that  such  a  process  was  viable.  Predictions 
from  the  errors  seen  in  the  lab  were  estimated  to  be  similar  in  hight,  with  an  er¬ 
ror  increase  from  the  additional  complexity  of  hight  including  motion  relative  to  the 
ground,  more  maneuverable  objects,  and  signihcant  increase  in  range  between  them. 
Unfortunately,  the  process  was  too  slow  to  be  implemented  in  real-time.  Because  of 
the  extra  time  required  to  create  contours  of  each  image  in  addition  to  the  discov¬ 
ered  inefficiency  of  the  OpenGL  to  OpenCV  conversion,  every  measurement  required 
approximately  four  seconds.  Most  likely,  this  is  unacceptable  for  close  formation  air¬ 
craft  navigation.  A  discussion  on  the  OpenGL  to  OpenGV  conversion  is  presented  in 
Section  5.5.3.  The  next  section  details  the  RIPE  approach  to  the  held  work. 

5.3  Field  Work 

The  held  work  entailed  hights  with  a  TPS  T-38A  Talon  aircraft  as  a  simu¬ 
lated  tanker  and  a  Galspan  LJ-25  Learjet  as  a  receiver  [10].  The  data  collection  was 
part  of  a  test  management  project  (TMP)  for  the  students  in  the  TPS  class  10  Al¬ 
pha.  The  Here’s  A  Visually- Enabled  Guided  Air  refueling  System  (HAVE  GAS)  TMP 
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Figure  5.10:  Partial  aircraft  occlusion.  To  test  the  algorithm’s  ability  to  track  with 
partial  occlusion,  the  camera  was  moved  limiting  the  amount  of  the  box-aircraft  seen 
in  the  held  of  view. 

group  hew  the  aircraft  through  representative  refueling  formations,  rejoins,  closures, 
and  separations  as  demonstrated  in  Figure  5.11.  The  LJ-25  had  a  digital  Prosilica® 
monochrome  camera,  model  GE1660,  installed  behind  the  windshield.  The  camera 
images  were  collected  by  an  on-board  compnter  for  post-hight  download.  The  aircraft 
were  hown  at  various  distances,  ohsets  from  centerline,  and  aspect  angles.  Truth  data 
for  36  diherent  parameters  were  collected  at  100  Hz  and  images  were  collected  at 
10  Hz. 

The  data  collection  devices  consisted  of  GPS  Aided  Inertial  Navigation  Ref¬ 
erence  (GAINR)  units,  conhguration  2B  (G2B)  located  on  the  two  diherent  aircraft. 
These  units  had  an  internal  IMU  (HG-1700)  and  dual  (L1/L2)  frequency  GPS  antenna 
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Figure  5.11:  HAVE  GAS  TMP  data  collection  flights.  The  data  collection  process 
doubled  as  a  curriculum  event  for  the  TPS  course. 

mounted  on  the  outside  of  the  aircraft.  A  picture  of  the  installed  unit  on  the  LJ-25  is 
shown  in  Figure  5.12.  The  data  were  Altered  post  flight  and  partially  corrected  with 
the  Foresight  locations  of  the  devices.  The  two  GAINR  units  were  time-synced  with 
GPS  and  recorded  their  data  stamped  with  GPS  time.  The  LJ-25  had  an  additional 
trnth  collection  system  installed,  used  as  a  backup  to  the  GAINR.  The  data  attained 
from  the  LJ-25  on-board  compnter  created  the  INS  sonrce  for  the  RIPE  algorithm, 
allowing  for  independence  of  the  trnth  source  from  the  algorithm.  The  truth  data 
had  a  reported  accuracy  in  position  of  18  inches  and  in  attitnde  of  0.1°  and  were  only 
nsed  for  an  initial  position  and  a  final  comparison  [7] . 

The  collected  images  were  time-stamped  with  a  file  name  based  on  the  inter¬ 
nal  clock  of  the  Prosilica®  camera.  When  power  was  applied  to  the  camera,  the 
internal  clock  started  counting  from  zero  at  a  rate  of  79,861,111  Hz.  Becanse  the 
camera  was  within  the  pressnrized  cabin  and  the  temperatnre  was  relatively  constant 
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Figure  5.12:  Truth  collection  device  installed  on  the  project  LJ-25.  A  similar  unit 
was  installed  in  the  nose  of  the  T-38. 

(±5°F),  very  little  environmentally-induced  variability  influenced  the  camera  tim¬ 
ing.  Converting  the  camera  timestamps  into  GPS  seconds,  to  match  the  truth  data 
timestamps,  required  the  number  of  counts  per  second  plus  the  time  the  camera  was 
initially  powered.  It  would  be  challenging  to  determine  the  exact  time  power  was 
applied,  so  time-sync  maneuvers  were  flown  in  flight.  The  maneuvers  were  visually 
identihable  in  both  the  Ic  and  the  truth  data  and  consisted  of  bank  to  bank  rolls  in 
both  aircraft.  Ultimately,  only  the  hrst  bank  maneuver  by  the  T-38  was  required,  the 
other  rolls  were  less  crisp  then  the  hrst  T-38  roll.  The  time  sync  was  accomplished  at 
the  beginning  and  end  of  the  hight  to  determine  if  there  was  any  time  drift.  During 
these  maneuvers  the  camera  collection  rate  was  increased  to  30  Hz  to  minimize  the 
time  between  collections  to  pinpoint  the  exact  time  of  maximum  bank  angle  in  the 
images.  An  example  of  the  T-38  at  maximum  roll  angle  with  the  truth  data  as  a 
contour  overlay  is  shown  in  Figure  5.13. 
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Figure  5.13:  Time  sync  maneuver.  Four  different  data  sources  were  synced  together 
using  a  few  time  sync  maneuvers  through  the  flight.  The  contour  overlay  is  from  the 
combination  of  the  two  GAINR  unit’s  truth  data.  A  position  bias  between  aircraft  in 
the  truth  data  is  visible  by  the  non-intentional  offset. 

For  the  second  flight  (the  data  used  in  this  thesis),  the  camera  was  determined 
to  be  turned  on  56,204.464  seconds  after  0000.000  local  time  with  a  95%  conhdence 
level  of  ±72  milliseconds  [10].  Accounting  for  this  possible  error  in  time  correlation 
between  the  three  data  sources  resulted  in  a  corresponding  range  error  of  ±3.5  inches 
with  an  assumed  maximum  closure  or  separation  rate  of  4  feet  per  second  [10]. 

The  Prosilica®  camera  used  was  1200  by  1600  pixel  resolution  with  5.5  microm¬ 
eter  pixel  size  for  a  sensor  size  of  6.6  millimeter  by  8.8  millimeter.  The  lens  was  a  VS 
Technology  Cooperation  Mega  Pixel  Closed  Circuit  Television  (CCTV)  SV-0814MP 
with  a  hxed  8.3  millimeter  focal-length  lens.  Focus  and  aperture  were  set  on  the 
gronnd  before  flight;  however,  the  pilot  could  adjust  if  required  in  flight.  Typically, 
only  one  or  two  adjustments  of  the  apertnre  were  necessary  in  flight,  once  the  tanker 
aircraft  was  in  the  FOV. 

Images  of  a  checkerboard  were  taken  with  the  system  camera  as  monnted  in 
a  hxed  location  on  the  windshield.  The  images  combined  with  the  calibration  soft¬ 
ware  [2]  determined  a  calibrated  camera  model  and  distortion  model.  The  held  work 


checkerboard  collection  was  slightly  more  cumbersome  and  had  to  be  partially  re¬ 
peated.  The  combination  of  a  wide  held  of  view  and  high  pitch  angle  of  the  camera 
limited  the  use  of  a  planned  lifting  device  from  getting  close  enough  to  the  camera. 
Additionally,  the  checkerboard  was  too  hexible  when  held  from  one  corner  as  shown 
in  Figure  5.14.  A  reinforcement  grid  of  wood  was  attached  to  the  back  for  stiffening, 
which  made  it  heavy  to  hold  and  position  correctly. 

The  camera  calibration  determined  the  K  matrix  to  be: 


K  = 


1562.95  -0.0127  767.84  0 
0  1528.77  607.709  0 

0  0  10 


(5.1) 


where  the  values  (except  skew,  K(l,2),  which  is  dimensionless)  are  in  pixels.  These 
values  are  presented  with  99.7%  conhdence  in  focal  length  of  ±  5.5  pixels  and  in 
principal  point  of  ±  3.0  pixels,  both  rounded  to  the  nearest  half  pixel.  The  skew  value 
was  within  ±  0.00036  with  99.7%  conhdence. 

It  is  important  to  note  the  distinction  between  empirical  values  and  manufac¬ 
tured  specihcations.  The  specihed  focal  length  of  the  lens  was  8.3mm,  which  should 
have  allowed  a  FOVy  =  44.83°  according  to  Equation  (2.30).  However,  using  the 
empirical  values  from  Equation  (5.1)  in  Equation  (2.30),  a  smaller  FOVy  =  43.36° 
results.  The  focal  length  in  pixels  for  the  Y  axis  was  1528.77  pixels  with  5.5/im  sized 
pixels,  the  empirical  focal  length  was  actually  8.4mm  (1528.77pixels  x  5.5/^™/pixei)  and 
8.6mm  for  the  X  axis.  This  is  an  important  distinction  when  using  the  OpenGL 
setup  and  must  be  accounted  for.  Additionally,  the  non-center  principal  points  and 
skew  of  the  K  are  not  accounted  for  in  the  normal  OpenGL  setup  and  is  addressed 
in  Section  5.5.1. 

The  resulting  pixel  error  from  the  distortion  removal  process  was  approximately 
one  pixel  in  both  axes.  This  distortion-model  pixel  error  was  a  result  of  the  difficulty 
in  modeling  the  visual  warping  through  the  windshield.  The  camera  was  not  initially 
rotated  correctly  on  the  dash  and  a  hnal  correction  to  rotate  the  camera  to  a  higher 
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Figure  5.14:  Collecting  camera  calibration  images.  This  process  was  more  difficult 
than  the  average  laboratory  setting,  flexing  in  the  original  board  caused  errors  in  the 
model. 

pitch  angle  brought  the  lens  very  close  to  the  very  curved  windshield.  The  effects  of 
this  pixel  error  are  not  significant  and  are  not  attributed  to  the  errors  of  this  process. 

The  data  were  collected  in  the  Edwards  AFB,  CA  R-2508  complex  during 
September  2010.  Collection  was  conducted  during  the  daytime  with  minimal  weather 
impact  (clouds,  rain,  etc.)  Two  flight  test  engineers  operated  the  camera  collection 
laptop  and  LJ-25  truth  collection  system.  Collections  were  accomplished  in  two  to 
four  minute  segments  due  to  limitations  in  airspace  and  to  minimize  possible  corrup¬ 
tion  or  data  loss  on  a  single  run.  Actual  AR  maneuvers  can  possibly  be  longer  in 
duration  and  most  likely,  less  dynamic. 
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5-4  Data  and  Errors 

After  the  data  were  collected,  the  RIPE  algorithm  was  tuned  for  the  most 
difficult  run  that  involved  the  Sun  in  the  FOV,  included  a  turn  and  a  crossing  from 
one  side  of  the  aircraft  to  the  other.  This  run  was  non-typical  for  AR,  so  more  benign 
representative  runs  characterize  the  quality  of  the  RIPE  approach,  while  the  others 
are  presented  for  a  determination  of  its  robustness. 

The  hrst  run  demonstrates  the  effects  of  the  Kalman  hlter  on  a  turning  track 
AR  maneuver.  Figure  5.15  shows  the  error  from  run  ten  of  the  second  flight.  In  this 
run,  the  wing  aircraft  started  at  168  feet  aft  of  the  lead  aircraft,  closed  to  62  feet 
and  maintained  it  for  15  seconds  before  backing  out.  The  run  was  conducted  at  a 
40°  elevation,  the  bottom  of  the  refueling  envelope  (witnessed  in  the  more  negative 
axis  translation,  compared  to  Figure  4.11)  and  translated  across  the  Yf,^  axis 
(witnessed  in  the  Y^^  axis  position  crossing  zero  feet  translation).  Additionally,  the 
lead  aircraft  maintained  above  30°  of  roll  angle  throughout  the  maneuver  and  the  tail 
of  the  aircraft  was  occluded  from  view  for  a  portion  of  the  run  (approximately  43  to  66 
second  marks).  This  result  appears  to  resemble  the  errors  of  the  measurement  only 
run  in  Figure  4.11;  however,  this  maneuver  was  more  dynamic  and  started  farther 
back.  The  analysis  of  the  data  from  Figure  5.15  is  shown  in  Table  5.2. 

The  errors  visible  in  the  hgure  and  table  for  run  number  ten  have  characteristics 
worth  examining.  First,  the  roll  errors  appear  less  discrete  than  the  measurement  only 
run  in  Figure  4.11.  Instead,  the  roll  error  now  has  an  oscillation  to  it.  This  oscillation 
is  still  attributed  to  the  discrete  capability  of  the  RIPE  in  the  roll  axis.  The  roll 
angle  has  to  go  beyond  0.5°  before  the  next  closest  integer  degree  appears  more  likely 
(the  perturbation  amounts  in  roll  were  ±1°  and  ±2°).  This  affects  the  roll  rate  of 
change  in  the  Kalman  hlter.  Large  discrete  jumps  in  roll  angle  appear  as  large  rates 
of  changes  to  the  hlter.  The  hlter  updates  the  roll  rate  of  change  state  appropriately 
and  predicts  a  similar  rate  of  change  for  the  next  propagation  step.  For  small  motions 
in  bank  angle,  this  causes  an  overshoot  and  the  oscillation  seen  in  the  roll  position. 
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Figure  5.15:  Flight  two,  run  ten  RIPE  error. 


DOF 

Max  (-)  Error 

Max  (-I-)  Error 

Mean  Error 

Standard  Deviation 

axis 

-76.0  inches 

38.0  inches 

-10.0  inches 

21.0  inches 

axis 

-16.0  inches 

2.5  inches 

-10.0  inches 

2.5  inches 

axis 

-20.0  inches 

36.0  inches 

4.0  inches 

9.0  inches 

Roll 

-2.0° 

1.5° 

O 

o 

o 

0.5° 

Table  5.2:  Data  analysis  of  flight  two,  run  ten.  Data  was  rounded  to  the  nearest  half 
inch  or  half  degree  respectively. 
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The  error  in  the  roll  is  not  an  important  concern,  since  it  is  just  a  means  to 
create  a  better  visual  representation  of  the  aircraft  to  determine  the  relative  position. 
With  such  a  small  standard  deviation  and  limited  impact  on  the  relative  position  no 
improvement  to  the  Kalman  hlter’s  roll  estimates  was  attempted.  For  brevity,  roll 
errors  are  left  out  of  the  remaining  error  plots. 

Other  errors  are  visible  in  run  number  ten.  Similar  to  Figure  4.11,  a  bias  is 
visible  in  the  Yf,^  axis,  attributed  to  a  bias  in  the  truth  collection.  Also,  interaction 
between  and  axis  is  apparent.  This  is  a  difficult  problem  in  AAR  [29]  and  pose 
in  general,  differentiating  between  motions  in  the  different  DOFs.  Evident  throughout 
the  run,  most  notably  at  the  beginning,  is  the  compensation  of  the  RIPE  algorithm. 
When  an  error  appears  in  one  axis,  a  visual  compensation  to  maintain  the  correct 
appearance  of  the  aircraft  appears  in  the  other  axis.  This  is  most  pronounced  at 
further  distances  between  aircraft  and  when  partial  occlusion  of  the  aircraft  occurs. 
Because  of  the  relatively  small  changes  in  the  Y;,^  axis,  the  interaction  errors  are 
generally  not  as  evident  in  this  axis,  a  later  example  will  demonstrate  this  interaction. 

The  theory  of  template  matching  should  reduce  the  effects  of  this  interaction;  if 
the  theory  is  correctly  implemented,  it  will.  It  is  compromised  when  implemented  in 
Cartesian  coordinates.  Unless  the  object  is  in  the  center  of  the  image,  any  perturba¬ 
tions  in  the  'Lcam  axis  appear  in  the  image  as  a  change  in  size  and  a  change  in  image 
location.  This  change  affects  the  visual  appearance  of  the  aircraft.  This  change  to 
the  visual  appearance  for  each  perturbation  in  the  "Lcam  axis  changes  the  template 
matching  likelihood  comparison.  The  theory  requires  for  the  size  plus  image  location 
group  that  the  appearance  of  the  aircraft  remains  the  same  while  allowing  a  change 
to  its  size  only.  If  correctly  implemented,  the  errors  witnessed  in  the  axis  could 
resemble  those  seen  in  the  Y;,^  axis. 

Next,  two  common  trends  were  witnessed  in  most  of  the  data  runs.  First  a 
large  jump  in  error  was  present  near  the  beginning  of  the  runs.  Second,  a  steadily 
increasing,  and  uncorrected  error  was  observed  as  the  aircraft  separate. 
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The  large  jump  in  error  at  the  beginning  is  partially  from  the  incorrect  initial 
state  provided  to  the  Kalman  hlter.  The  initial  state  provided  to  the  hlter  are  the 
three  translation  values  and  the  roll  value,  the  rates  of  change  to  these  states  and  their 
derivatives  were  set  to  zero.  The  hlter  takes  some  time  to  determine  those  values  from 
the  measurement  updates  provided  by  the  RIPE  process.  Additionally,  the  large  error 
at  the  beginning  of  the  run  is  attributed  to  the  size  of  the  allotted  perturbations  at 
the  farther  distances.  The  pertnrbations  in  the  "Lcam  axis  were  arbitrarily  chosen  to 
be  constant  thronghont  the  rnn  (±5  and  ±10  inches).  At  the  farther  distances,  the 
visnal  appearance  of  the  rendered  aircraft  does  not  change  dramatically,  if  at  all,  with 
snch  small  pertnrbations.  This  allows  all  hve  of  the  possible  I,,  images  in  the  size  plus 
image  location  gronp  to  appear  equally  likely.  With  the  Kalman  hlter  determining 
the  rate  of  change  of  these  states  at  the  same  time  errant  measnrements  are  likely, 
the  error  is  exacerbated  nntil  the  aircraft  are  close  enongh  together  that  a  correction 
is  possible.  A  worst  case  example  of  this  error  is  presented  later. 

Additionally,  ahecting  the  beginning  as  well  as  the  ending  of  the  rnns,  the 
distance  between  aircraft  inhnences  the  errors.  On  average  the  errors  are  typically 
3-4%  of  the  distance  between  the  aircraft  or  less,  with  the  maximnm  error  valnes  less 
than  7%  of  the  distance  between  the  aircraft  (both  measnred  in  the  Xf,^  axis).  As 
an  example,  the  maximnm  error  in  the  axis  near  the  beginning  of  the  rnn  in 
Figure  5.15  is  2%  of  the  distance  and  at  the  end  of  the  rnn,  the  error 

is  4.75%  of  the  distance  This  is  a  resolvable  distance  limitation  of  the 

images  as  discussed  in  Section  4.4. 

The  second  trend  in  the  data  rnns  is  the  steady,  increasing,  and  nncorrected 
error  at  the  end  of  the  rnn.  This  is  attribnted  to  the  template  matching.  For  an 
nnknown  reason,  the  algorithm  does  not  match  the  size  of  the  receding  aircraft  as 
accnrately  as  it  does  when  the  aircraft  is  closing.  This  is  a  potential  limitation  to  the 
template  matching  fnnction. 
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The  error  analysis  in  Table  5.3  characterizes  the  error  in  run  number  ten  for 
just  the  pre-contact  position  (1150  inches)  and  closer,  when  accuracy  is  most  critical 
(between  38  seconds  and  82  seconds  in  Figure  5.15.)  In  general,  most  of  the  errors 
are  reduced.  The  most  notable  changes  are  the  errors  in  the  and  axis,  both 
are  reduced  in  max  errors  and  standard  deviation  of  error.  The  mean  error  in  the 
axis  is  reduced  dramatically  as  well;  however,  this  is  because  the  interaction  with 
the  axis  caused  the  error  to  transition  negative  for  half  the  time.  The  root  mean 
square  error  is  13  inches  in  the  pre-contact  position  or  closer. 


DOF 

Max  (-)  Error 

Max  (-I-)  Error 

Mean  Error 

Standard  Deviation 

Xf,^  axis 

-30.0  inches 

18.0  inches 

0.0  inches 

13.0  inches 

Yfe^  axis 

-16.0  inches 

-10.0  inches 

-12.0  inches 

1.0  inches 

Zb^  axis 

-8.0  inches 

18.0  inches 

5.0  inches 

6.0  inches 

Roll 

-2.0° 

1.5° 

o 

o 

o 

1.0° 

Table  5.3:  Data  analysis  of  flight  two  run  ten,  pre-contact  (1,150  inches)  and  closer. 
Data  was  rounded  to  the  nearest  half  inch  and  degree  respectively. 


Figure  5.16  shows  run  18,  another  example  run  of  maneuvers  resembling  AR. 
This  error  plot  shows  only  the  run’s  pre-contact  to  contact  transition  and  back.  De¬ 
terminations  of  the  error  during  run  18  are  shown  in  Table  5.5.  With  a  few  differences, 
the  errors  in  this  run  are  similar  to  the  errors  in  run  ten,  pre-contact  position  and 
closer  (Table  5.3).  The  interaction  between  the  two  axes  (Xf,^,  Zb^)  is  evident  as  well 
as  the  bias  in  the  Y axis.  The  largest  differences  between  the  runs  is  the  mean  error 
in  the  Zf,^  axis,  which  is  now  negative  and  an  outlier  in  the  Y^^  axis  affected  the 
maximum  (-|-)  error  in  that  axis.  The  Xf,^  axis  error  has  a  slightly  reduced  standard 
deviation  because  of  a  slower  separation  rate  between  aircraft. 

A  hnal  AR  representative  run,  number  12,  is  shown  in  Figure  5.17.  This  run 
shows  a  larger  error  in  the  beginning  (6.5%  as  a  function  of  Xf,^  axis  distance  between 
aircraft)  that  is  able  to  correct  before  the  pre-contact  position.  This  is  shown  as 
an  example  of  a  worst  case  for  the  representative  AR  maneuvers.  The  error  is  a 
combination  of  incorrect  implementation  of  the  template  matching  theory,  resolvable 
distance  limitation,  and  incorrect  initial  conditions  to  the  hlter. 
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^  Flight  #2  (Run18)  with  Kalman  Filter,  b  -frame 

(/>  L 


(/> 


</i 


Figure  5.16:  Flight  two,  run  18  RIPE  error. 


DOF 

Max  (-)  Error 
(inches) 

Max  (-I-)  Error 
(inches) 

Mean  Error 
(inches) 

Standard  Deviation 
(inches) 

axis 

-30.0 

17.0 

-0.5 

9.5 

axis 

-12.0 

1.5 

-8.5 

1.5 

Zf,^  axis 

-18.5 

6.0 

-5.0 

5.0 

Table  5.4:  Data  analysis  of  flight  two,  run  18  pre-contact  position  (1,150  inches)  and 
closer.  Data  was  rounded  to  the  nearest  half  inch. 
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Error  in  Z  Axis  (Inches)  Error  In  Y  Axis  (Inches)  Error  In  X  Axis  (Inches) 


Figure  5.17:  Flight  two,  run  12  RIPE  error. 


DOF 

Max  (-)  Error 
{inches) 

Max  (-I-)  Error 
{inches) 

Mean  Error 
{inches) 

Standard  Deviation 
{inches) 

axis 

-25.0 

29.0 

15 

8.0 

axis 

-15.0 

-9.0 

-12 

1.5 

axis 

0.0 

15.0 

10 

2.5 

Table  5.5:  Data  analysis  of  flight  two,  run  12  pre-contact  position  (1,150  inches)  and 
closer.  Data  was  rounded  to  the  nearest  half  inch.  Demonstrating  a  worst  case  for 
representative  AR  maneuvers. 
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As  a  challenge  to  the  algorithm,  two  atypical  AR  maneuvers  were  flown.  The 
first  tested  the  algorithm’s  ability  to  handle  extreme  dynamic  motions.  The  sec¬ 
ond  tested  the  algorithm’s  ability  to  handle  difficult  visual  situations  in  addition  to 
dynamic  motions. 

The  first  challenge  required  the  wing  aircraft  to  initially  maintain  the  pre¬ 
contact  position  with  approximately  30°  roll  angle.  After  stabilizing,  the  wing  aircraft 
initiated  a  rapid  roll  reversal  causing  the  lead  aircraft  to  quickly  exit  the  FOV  of  the 
camera.  Because  of  the  speed  at  which  the  lead  aircraft  left  the  FOV,  this  maneuver 
was  captured  and  processed  by  the  RIPE  algorithm  at  30  Hz.  Because  these  dynamics 
were  not  modeled,  the  Kalman  filter  was  not  used  (witnessed  in  the  discrete  errors  in 
Figure  5.19). 

Figure  5.18  shows  a  few  sample  images  from  the  maneuver,  with  the  RIPE 
estimated  position  contoured  overlayed.  Figure  5.19  shows  the  errors  resulting  from 
this  maneuver.  The  errors  associated  with  this  dynamic  maneuver  are  mainly  realized 
in  the  Zf,^  axis. 

The  second  challenge  included  both  visual  and  dynamic  motions  in  a  difficult 
AR  situation.  Sample  images  from  this  maneuver  are  shown  in  Figure  5.20.  The 
maneuver  began  with  the  wing  aircraft  displaced  to  the  left  of  the  lead  aircraft  with 
approximately  30°  roll  angle.  The  wing  aircraft  transitioned  sides  when  the  sun 
started  to  appear  in  the  camera  FOV.  The  sun  streaks  on  the  windshield  were  a 
challenge  to  the  template  matching,  and  the  algorithm  adapted  the  contouring-first 
method  used  in  the  laboratory  work.  This  obviously  adds  to  the  time  required  to 
compute  a  measurement  and  is  not  optimal;  however  it  does  allow  the  algorithm 
to  continue  working  even  with  difficult  visual  noise.  Figure  5.21  shows  the  errors 
associated  with  this  visually  challenging  maneuver.  It  would  be  very  difficult  to 
navigate  with  the  large  errors  at  the  end  of  the  run.  A  better  image  processing 
technique,  such  as  a  Local  Illumination  Normalization  Filter,  might  produce  better 
results  with  the  sun  in  the  FOV  [17].  Tracking  at  all  through  the  sun  was  very 
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difficult  and  the  important  aspect  is  the  capability  of  the  RIPE  process,  not  the 
image  processing  technique.  As  a  note,  all  the  errors  presented  in  Figure  5.21  are  9% 
or  less  as  a  function  of  distance  between  the  aircraft. 

As  discussed  previously,  the  interaction  between  the  axis  and  the  is 
more  apparent  in  Figure  5.21  than  previous  error  charts. 

Through  all  the  runs  presented  in  this  chapter,  the  average  processing  time  was 
between  1.5  to  4  seconds,  depending  on  the  size  of  the  aircraft  in  the  image.  When 
the  aircraft  was  farther  away,  a  smaller  region  of  interest  for  the  template  matching 
allowed  faster  processing. 

The  following  are  potential  solutions  to  the  errors  presented  in  this  section: 

•  Template  Matching  Theory:  The  theory  requires  independence  of  Xj^age 
and  Y image  from  changes  in  Zcam  during  the  size  plus  image  location  group.  The 
RIPE  digital  zoom  alternative  approach,  or  the  use  of  spherical  coordinates  in 
the  cam-frame  (adjusting  only  the  distance  between  the  aircraft)  might  correct 
these  errors. 

•  Resolvable  Distances:  Correctable  with  a  higher  resolution  camera  and 
lens.  This  will  increase  the  number  of  pixels  representing  the  aircraft.  At 
farther  distances,  this  will  allow  small  perturbations  in  the  Z^am  axis  to  make 
visual  changes  to  the  appearance  of  the  aircraft,  minimizing  measurement  errors 
to  the  hlter. 

•  Kalman  Filter:  Better  initial  estimates  plus  potentially  tracking  in  the  n- 
frame  where  the  motions  of  the  aircraft  were  modeled  [29] .  Account  for  transla¬ 
tional  changes  of  the  wing  aircraft  by  incorporating  INS  velocity  or  acceleration 
values  into  the  hlter. 

Improvements  are  necessary  to  this  approach;  however,  the  errors  are  man¬ 
ageable  for  implementation  into  an  autopilot  solution.  This  concludes  the  section  on 
experimental  data  and  error  analysis.  The  next  section  presents  process  improvements 
discovered  through  the  research  and  experimentation. 
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Figure  5.18:  Field  work,  dynamic  motion.  The  collected  images  are  shown  with  the 
most-likely,  rendered,  perturbation  images  shown  as  overlays. 

(a)  Initial  position,  the  wing  aircraft  maintained  pre-contact  position  with  approxi¬ 
mately  30°  roll  angle. 

(b)  0.5  seconds  after  the  wing  aircraft  initiated  a  rapid  roll  angle  reversal. 

(c)  1.3  seconds  later  the  lead  aircraft  exits  the  cameras  FOV. 


Figure  5.19:  Flight  two,  run  28  RIPE  error.  The  error  associated  with  the  held  work, 
dynamic  motion  shown  in  Figure  5.18. 
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(f)  (g) 


Figure  5.20:  Field  work,  visual  challenge.  Entire  run  conducted  with  approximately 
30°  roll  angle.  The  collected  images  are  shown  with  the  most-likely,  rendered,  pertur¬ 
bation  images  shown  as  overlays  (except  (g)). 

(a)  Initial  position. 

(b)  Occlusion,  lead  on  the  right. 

(c)  Cross  to  the  other  side. 

(d)  Occlusion,  lead  on  the  left. 

(e)  Sun  covers  a  good  portion  of  the  wing  and  elevator. 

(f)  Backing  away. 

(g)  Actual  view  of  image  (d)  used  by  the  algorithm. 
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Flight  #2  (Run20)  with  Kalman  Filter,  frame 


Figure  5.21:  Flight  two,  run  20  RIPE  error.  The  error  associated  with  the  held  work, 
visual  challenge  shown  in  Figure  5.20 
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5.5  Areas  of  Improvement 

Throughout  the  laboratory  and  held  work,  many  improvements  and  rehnements 
to  the  process  were  discovered  and  warrant  presentation  for  future  experimentation. 
They  include  a  better  representation  of  a  camera  in  OpenGL,  a  validation  process 
that  should  be  considered  when  using  OpenGL,  and  a  slightly  more  efficient  method 
of  converting  images  from  OpenGL  to  OpenGV. 

5.5.1  A  More  Accurate  OpenGL  Camera  Representation.  As  mentioned 
in  Section  2.2. 1.2,  the  transformation  matrix  created  by  glFrustum(  )  is  actually  the 
combination  of  two  transformation  matrices,  Ti  and  T2  [22].  The  transformations 
are  labeled  in  their  order  of  occurrence  and  shown  in  the  equation  as  pre-multipliers. 
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The  hrst  matrix,  Ti,  transforms  the  pyramid  frustum  into  a  rectangle  shape. 
This  process  is  similar  to  the  mapping  of  the  cam-frame  to  image-h:a,m.e  by  the  K 
matrix  shown  in  Section  2. 1.3. 5,  with  a  few  exceptions:  the  Ti  transformation  main¬ 
tains  the  Z  axis  translation  information  and  there  is  no  skew  or  principal  points.  The 
principal  points  are  accounted  for  in  the  translation  from  GVV  to  GLimage-frame  in 
Equations  (2.38  and  2.39).  If  we  label  the  transformations  with  the  same  notation  as 
the  DGMs,  they  become:  K  — )■  and  Ti  — 

Because  the  location  of  frames  in  OpenGL  are  decided  by  the  user,  the  GLcam- 
frame  is  collocated  with  the  cam-frame.  As  a  result,  instead  of  using  T 1  to  map  the 
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points  to  the  rectangle  shape  the  parameters  of  can  be  used,  accounting  for 

the  difference  in  orientations  of  the  frames.  However,  because  OpenGL  maintains  the 
ZcLcam  translation  information  until  placing  the  contents  of  the  CVV  onto  an  image, 
an  additional  row  is  needed  in  the  matrix  It  can  be  shown  that  the  third  row 

would  be  similar  to  the  third  row  in  Ti,  with  a  negative  sign  for  the  difference  in  z 
translation  directions. 

To  facilitate  the  transformation,  the  cam'-frame  is  introduced.  This  frame  is  the 
projection  of  the  cam-frame  onto  the  near  clipping  plane,  and  is  shown  in  Figure  5.22. 


Figure  5.22:  The  GLcam-fiame  and  cam-frame.  They  are  collocated  to  permit  the 
use  of  the  camera  parameters  to  transform  the  pyramid  frustum  to  the  rectangular 
shape.  The  projection  of  the  cam-frame  onto  the  near  clipping  plane  denotes  the 
cam'-frame. 
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By  maintaining  a  symmetric  viewing  window  (not  accounting  for  off-center  prin¬ 
cipal  points),  the  transformation  can  be  broken  down  into  components: 
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where,  T  and  T  cream  account  for  the  difference  in  orientation  of  the  frames;  T  2 
does  not  convert  frames  and  only  scales  the  points  in  the  NDC-irame,  and  as 

described  in  Section  2. 1.3.5,  was  3x4  and  the  4x4  is  required  in  this  equation. 


To  account  for  off-set  principal  points,  the  creation  of  T2  (or  glOrtho(  ))  is 
offset  by  the  width  and  height  of  the  desired  image,  (right=W,  left=0,  top=0,  and 
bottom=H)  and  K  is  offset  in  the  opposite  direction  by  the  amount  of  the  prin¬ 
cipal  points: 
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The  two  translations  and  T^GLcam)  be  accounted  for  in  a  single 

K  matrix,  relabeled  K  gl,  including  the  addition  of  the  skewing  value  which  is  solved 
in  the  same  manner  as  the  skew  in  Section  2. 1.3.5: 
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In  place  of  the  function  call  to  glFrustrum(  ),  glLoadMatrixf( )  loads  the  combina¬ 
tion  of  these  two  matrices.  As  an  important  note,  contrary  to  the  matrix  multiplica¬ 
tion  conventions  presented  here,  OpenGL  post-multiplies  matrices,  so  the  combination 
of  these  two  matrices  has  to  be  transposed  before  using  the  glLoadMatrixf( )  command. 
The  next  section  introduces  a  validation  concept  for  future  work  using  OpenGL. 


5.5.2  Validation.  Because  of  the  many  transformations,  units,  data  sources, 
and  people  involved  in  this  research  effort,  many  items  were  not  overseen  by  the 
author.  Many  items  were  assumed  accurate  until  proven  otherwise,  which  proved  to 
be  an  incorrect  assumption.  Because  of  the  many  sources  of  information,  an  ad-hoc 
validation  process  was  created  to  address  some  these  problems.  The  validation  process 
amounted  to  rendering  the  truth  data  and  visually  determining  the  accuracy  of  the 
entire  process:  the  truth  data,  the  model,  and  the  OpenGL  rendering  process.  It 
was  quickly  determined  that  matching  images  in  an  un-validated  process  introduced 
signihcant  errors  in  the  ultimate  solution. 

The  truth  data  collected  were  also  used  in  this  validation  process.  For  future 
work  this  data  should  be  from  a  dissimilar  truth-data  set.  Because  of  this,  errors 
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associated  with  the  validation  of  the  process  can  not  be  separated  from  the  navigation- 
solution  algorithm. 

The  validation  process  involved  rendering  images  using  the  truth  data  and  com¬ 
paring  them  with  collected  images  from  the  same  time  instant.  Because  of  the  fol¬ 
lowing  issues:  errors  in  boresighting,  data  collection  equipment,  truth  data  collection, 
model  creation,  and  OpenGL  implementation,  there  will  be  some  visual  differences 
between  rendered  and  collected  images.  The  goal  of  validation  is  to  minimize  the 
visual  difference  between  the  two  types  of  images  by  determining  these  process  errors, 
truth-data  biases,  and  incorrect  process  operations.  The  validation  process  dehned 
here  was  based  on  human  estimation  of  image-match  comparison,  because  this  was 
an  unforseen  complication  in  the  research.  Better  comparison  methods  are  available 
and  should  be  addressed  more  in-depthly  in  future  projects  involving  OpenGL.  Gom- 
parisons  methods  such  as  those  described  in  Section  2. 2. 2. 2  (non-template  matching 
versions)  can  be  used  for  this  validation  process. 

A  quick  description  of  the  validation  process  is  presented,  with  an  example 
shown  in  Figure  5.23,  followed  by  the  determined  errors. 

Taking  truth  data  from  various  times  throughout  the  data  collection  process, 
images  were  created  with  the  same  parameters  as  the  collected  Ic  image.  A  contour 
function  (Section  2.2.2. 1)  was  used  to  determine  the  outline  of  the  rendered  aircraft  in 
the  1^.  This  outline  was  added  to  the  Ic  for  a  visual  analysis  of  similarity.  A  created 
graphical  user  interface  (GUI)  perturbed  the  rendered  aircraft  in  all  six  degrees  of 
freedom  to  allow  a  more  accurate  overlay  of  the  Ic  on  the  Ic.  The  perturbations  in 
the  GUI  could  be  accomplished  in  the  cam-frame  or  6-frame  of  either  aircraft,  as 
changing  translation  and  attitude  in  these  each  of  these  frames  results  in  different 
visual  changes. 

By  perturbing  the  truth  values  of  multiple  images  at  different  times  during  the 
flight  (at  different  attitudes  and  different  distances  between  the  aircraft)  a  validation 
of  the  OpenGL  rendering  process  was  determined.  The  validation  process  discovered 
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Figure  5.23:  OpenGL  validation  process. 

(a)  Ic  from  the  test  flights. 

(b)  Truth  data  collected  with  each  Ic  is  used  to  create  an  1^. 

(c)  An  edge  detector  was  used  to  determine  the  contour  of  the  1^. 

(d)  The  contour  outline  was  added  to  the  for  visual  similarity  comparison. 

(e)  Required  perturbations  about  the  truth  state  determine  the  validity  of  the  process. 
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that  there  were,  at  least,  hve  biases  consistent  throughout  the  flight.  The  first  two 
were  obviously  caused  by  errors  in  the  boresighting  of  the  C2B  and/or  the  inclusion 
of  those  values  into  the  truth  data.  Neither  of  these  two  processes  involved  the  author 
and  no  determination  about  them  could  be  made.  These  biases  were  conhrmed  at 
various  other  portions  of  the  flight,  during  cruise,  landing,  and  takeoff.  Aircraft 
in  these  flight  phases  are  generally  at  zero  degrees  of  bank,  yet  both  aircraft  were 
consistently  showing  approximately  five  degrees  of  bank  during  these  times. 

After  removing  the  roll  biases,  the  other  three  biases  were  between  the  truth 
sources.  Throughout  the  validation  process,  the  required  translational  perturbations 
at  each  position  were  fairly  consistent  in  each  axis.  These  translational  differences 
were  averaged  across  the  flight  for  an  overall  estimated  bias.  The  cause  of  these  biases 
was  never  determined;  however,  they  were  fairly  constant  throughout  the  flight  (the 
standard  deviation  between  the  determined  biases  was  between  five  and  ten  inches). 
They  were  most  likely  from  errors  in  the  addition  of  the  boresighting  values  to  the 
truth  data. 

The  perturbation  required  for  correcting  the  in  Figure  5.23  to  overlay  on  the 
Ic  was  approximately  ten  inches  in  both  the  camera  Y  axis  and  Z  axis.  Because  of  the 
difficulties  in  determining  translations  in  the  X  axis  for  both  humans  and  computers, 
the  visual-estimate  validation  process  is  not  as  accurate  as  it  could  be. 

As  part  of  the  entire  process,  an  improvement  to  the  conversion  between  OpenGL 
to  OpenCV  images  is  presented  next. 

5.5.3  OpenGL  to  OpenCV  improvements.  It  was  evident  early  in  the  re¬ 
search  that  the  process  detailed  in  Section  2.2.3  is  not  as  efficient  as  it  needs  to  be. 
Rendering  multiple  images  by  OpenGL  is  quick  and  so  is  the  comparison  of  those 
images  (once  converted)  in  OpenGV.  However,  to  utilize  the  benefits  of  each  library, 
a  quicker  conversion  between  them  is  needed. 

This  research  used  a  slightly  more  efficient  method  as  a  partial  solution,  shown 
in  Listing  5.1.  This  method  uses  an  OpenGL  function,  glReadPixels(  )  (also  used 
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in  the  example  in  Section  2.2.3)  to  copy  pixels  of  the  image  to  another  memory 
location.  However,  instead  of  the  pixel-parsing  process  of  Section  2.2.3,  the  entire 
image  transfers  to  OpenCV  collectively,  in  one  step,  with  the  image  still  requiring  a 
flip. 


Listing  5.1:  Simpler  OpenGL  to  OpenCV  image  conversion  pseudocode 
g  I  Re  a  d  P  i  xe  I  s  (  widt  h  ,  height,  CVimage— >imageData  ) 
c  V  F I  i  p  ( CVimage ) 


This  warrants  some  explanation.  Even  though  the  OpenCV  references  the  pixels 
of  an  image  in  a  matrix  as  shown  in  Figure  2.24,  the  memory  storage  used  is  similar 
to  the  OpenGL  image  storage  as  shown  in  Figure  2.23.  OpenCV  simply  accesses 
those  memory  locations  for  the  user,  through  the  I  pi  I  mage  structure.  This  does  not 
completely  eliminate  the  delay  needed  to  transfer  the  image,  mainly  because  of  the 
inefficiency  of  the  glReadPixels( )  function.  This  function  was  created  to  copy  a  portion 
of  the  screen.  It  was  not  created  to  read  an  entire  image.  The  image  must  first  be 
rendered,  placed  into  the  video  buffer,  and  all  other  OpenGL  commands  completed 
before  glReadPixels(  )  is  allowed  to  run  and  hnd  the  pixels  in  the  user’s  region  of 
interest.  A  more  efficient  solution  would  possibly  render  the  OpenGL  image  directly  to 
the  memory  location  of  a  blank  OpenCV  image,  skipping  the  video  buffer  completely. 
Finding  an  efficient  conversion  between  the  two  libraries  was  not  further  researched, 
but  will  be  critical  to  the  success  of  this  approach  to  navigation. 

5. 6  Summary 

The  RIPE  algorithm  as  presented  was  able  to  meet  three  of  the  four  goals 
outlined  in  Chapter  1.  The  presented  algorithm,  despite  multiple  discovered  sources 
of  error  met  the  following  goals:  it  increased  the  accuracy  of  whole- aircraft  tracking, 
required  no  modihcation  to  the  lead  aircraft,  and  used  only  open  source  programming 
libraries.  The  algorithm  did  not  satisfactory  complete  the  fourth  goal:  complete 
navigation  updates  in  a  timely  manner. 
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The  errors  relating  to  the  position  estimation  of  RIPE  ranged  from  0-9%  based 
on  distance  between  the  aircraft,  including  the  most  dynamic  of  maneuvers  and  chal¬ 
lenging  environmental  noise.  Average  error  in  the  pre-contact  position  and  closer  was 
2-3%,  or  less  than  two  feet  error  at  a  distance  of  62.5  feet.  Based  on  the  author’s 
experience  with  AR,  this  is  as  accurate  as  most  pilots  in  the  contact  position  and  more 
accurate  than  pilots  in  the  pre-contact  position.  Because  the  errors  were  often  close 
to  or  less  than  the  accuracy  of  the  truth  data,  an  accurate  characterization  is  difficult; 
however,  this  process  consistently  showed  less  than  Eve  feet  of  error  in  the  contact 
position,  meeting  the  objective.  The  assumption  that  a  boom  operator  can  fly  the 
boom  as  needed  in  the  envelope  to  effect  the  rendezvous,  this  accuracy  is  sufficient  to 
permit  AAR. 

From  the  pictures  taken  of  the  aircraft,  no  texture  or  paint  modifications  were 
done  to  the  lead  aircraft.  A  model  of  the  aircraft  using  current  paint  schemes  was 
determined  to  be  adequate. 

The  open  source  C  programming  libraries  worked  well  for  this  process  and  within 
each  library,  operations  were  quick  enough  to  accomplish  the  necessary  processing. 
However,  the  conversion  between  libraries  was  not  efficient  enough  to  permit  real¬ 
time  processing.  Until  an  efficient  method  is  determined,  or  other  processing  time 
efficiencies  are  discovered,  the  RIPE  process  using  OpenGL  and  OpenCV  will  be 
difficult  to  implement  real-time. 
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VI.  Conclusions 


This  thesis  presented  a  method  of  image-aided  navigation  for  AAR.  This  chap¬ 
ter  concludes  the  thesis  with  a  final  assessment  of  the  methods  presented  and 
recommendations  for  future  research. 

6.1  Conclusions 

Many  aspects  of  the  RIPE  algorithm  worked  well:  the  rendering  aspects  of 
OpenGL  in  conjunction  with  the  image  manipulation  and  comparison  functions  of 
OpenCV  provided  accurate  representations  and  analysis  of  the  real-world.  Other 
aspects  of  the  RIPE  approach  were  not  optimally  implemented:  the  incorrect  use 
of  size  in  template  matching,  perturbations  which  were  too  small  to  cause  visual 
differences  in  appearance,  and  most  obviously  processing  speed. 

6.1.1  OpenGL.  With  a  new  method  to  represent  a  camera  in  OpenGL, 
the  renderings  produced  were  remarkably  similar  to  the  camera  images.  With  a 
photo  textured  bottom  on  the  aircraft,  high  matching  values  between  the  two  images 
were  possible.  But,  as  noticeable  in  many  of  the  photos  throughout  the  thesis,  the 
OpenGL  lighting  never  accurately  represented  the  real-world.  This  is  because  of  time 
limitations  in  this  thesis,  not  because  of  OpenGL.  In  fact,  this  limitation  drove  the 
algorithm  to  miss-match  images  (find  the  direct  opposite  image),  requiring  a  very 
light  background  in  the  renderings  instead  of  the  dark  background  of  the  collected 
images.  This  ultimately  added  to  the  errors  reported  in  Ghapter  5.  A  better  lighting 
environment,  including  ambient  and  direct  sunlight  will  create  a  better  representation 
of  the  real-world,  thus  reducing  a  portion  of  the  estimation  error.  With  the  location 
of  the  aircraft  known  in  addition  to  the  time  of  day  flown,  the  sun’s  trajectory  with 
respect  to  the  aircraft  can  be  modeled  and  included  in  the  lighting  scheme.  By  mod¬ 
eling  a  better  lighting  environment,  it  might  be  possible  to  use  the  simple  matching 
algorithms  presented  in  Ghapter  2  rather  than  developing  more  complex  ones. 
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Other  limitations  to  the  OpenGL  process  was  the  model  itself.  It  was  never 
fully  completed.  Many  aspects  of  the  aircraft  were  not  accounted  for  in  the  model 
used  -  most  notably  the  remaining  paint  schemes  including  the  vertical  tail  and  sides 
of  the  fuselage  in  addition  to  the  oblong,  curved  disk  at  the  top  of  the  vertical  tail. 
The  backup  model  accounts  for  a  few  of  these  aspects,  but  is  still  not  complete  and 
was  not  validated  in  time  for  completion  of  this  thesis. 

After  incorporating  the  new  camera  representation,  the  partially  completed 
model,  and  after  the  validation  process,  the  location,  attitude,  and  size  of  the  ren¬ 
dered  aircraft  closely  matched  the  true  aircraft.  There  was  some  residual  error  as  a 
result  of  the  arbitrarily  located  zFar  clipping  plane.  Because  of  the  scaling  involved  in 
the  OpenGL  process,  an  inhnite  zFar  clipping  plane  is  not  possible  and  recreating  a 
real-world  scene  at  far  distances  was  challenging  (beyond  the  distances  of  contact  and 
pre-contact).  The  errors  at  this  far  distance  were  exacerbated  by  the  relatively  small 
sized  perturbations  at  the  farther  distance.  This  either  limits  the  distance  away  an 
object  can  be  for  the  RIPE  approach  to  be  valid,  or  dynamic  zNear  and  zFar  clipping 
planes  are  needed  to  compensate. 

6.1.2  OpenCV.  The  performance  of  OpenGV  was  remarkable.  The  contour¬ 
ing  and  template  matching  function  were  essential  to  the  RIPE  approach.  Improve¬ 
ments  to  the  template  matching  process  are  possible:  a  more  thorough  trigonometric 
equation  might  possibly  reduce  some  of  the  estimation  errors.  Overall  it  worked  very 
well  for  the  position  and  orientation  estimation.  The  template  matching  did  suffer 
at  farther  distances,  mainly  because  of  resolvable  distances  of  the  images.  Another 
method,  such  as  the  center  and  tips  approach  in  [15],  might  improve  performance  at 
farther  distances,  with  a  transition  to  RIPE  at  closer  distances.  This  would  also  allow 
the  zFar  plane  to  remain  a  fixed  distance,  closer  to  the  camera  origin. 

6.1.3  RIPE.  The  research  presented,  including  the  resulting  errors,  demon¬ 
strated  RIPE  as  a  valid  approach  to  AAR  and  potentially  pose  in  general.  Admittedly, 
aircraft  motion  in  AR  is  relatively  benign  and  the  RIPE  approach  leverages  the  actual 
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dynamics  of  the  aircraft  to  balance  speed  with  accuracy.  It  was  intended  that  the 
process  would  estimate  the  position  in  the  Ycam  axis  as  well  as  it  did  in  the  Xcam 
axis.  They  are  essentially  very  similar;  however,  the  size  group  {T^cam  axis)  influence 
on  the  Ycam  axis  was  apparent.  This  was  a  result  of  implementing  the  template 
matching  process  with  Cartesian  coordinates.  Another  method,  such  as  the  digital 
zoom  alternate  method  presented  in  Chapter  4  or  the  use  of  spherical  coordinates  in 
the  cam-frame,  could  eliminate  errors  associated  with  this  interaction.  Ultimately, 
the  approach  adequately  performs  as  needed.  Further  projects  involving  accurate 
implementation  of  the  RIPE  method  are  required  to  understand  fully  its  capabilities 
and  limitations. 

6.1.4  Processing  Speed.  The  slow  processing  speed  of  transferring  images 
between  libraries  was  unexpected.  The  process  was  exacerbated  by  the  number  of 
required  images  to  render  and  the  size  of  each  image.  No  efficiencies,  other  than  the 
OpenGL  to  OpenCV  process  presented  in  Chapter  5  were  pursued.  Recommendations 
for  decreasing  the  time  to  make  estimations  is  presented  in  the  next  section. 

6.2  Future  Research 

Currently,  the  main  limitation  to  the  RIPE  process  applied  to  AAR  is  the 
processing  speed  to  make  estimations  based  on  the  images.  Other  limitations  have 
been  presented  and  possible  improvements  are  discussed  in  this  section.  In  addition 
to  improvements  of  the  RIPE  process,  broadening  applications  and  adaptations  of  its 
potential  are  presented. 

6.2.1  Processing  Speed.  To  improve  the  processing  speed  of  the  RIPE  esti¬ 
mation,  a  few  aspects  can  be  considered:  reduce  the  number  of  transfers  from  OpenGL 
to  OpenCV  per  update,  reduce  the  size  of  the  required  images,  improve  the  transfer 
process,  and  change  the  coordinate  frame  for  tracking  to  improve  the  tracking  process 
such  that  less  measurements  are  required. 
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The  first  option  is  to  reduce  the  number  of  image  transfers  needed  per  update. 
This  could  be  realized  in  two  different  methods:  either  reduce  the  number  of  ren¬ 
dered  images  used  by  the  RIPE  process  or  pre-render  a  batch  of  images.  Since  the 
presented  process  only  required  10-15  images  per  update,  reducing  this  number  any 
further  requires  one  of  the  presented  alternate  methods  to  RIPE  (Section  4. 3. 1.4.) 
Additionally,  this  would  only  solve  the  problem  for  AAR,  not  for  RIPE  in  general. 
Another  method,  pre-rendering  images,  could  be  accomplished  completely  prior  to 
flight,  or  real-time  with  a  parallel  processor  making  estimates  of  where  the  aircraft 
might  be  and  the  images  required.  Either  option  has  potential  to  help  reduce  a  portion 
of  the  processing  time. 

A  second  option  is  to  reduce  the  size  of  the  required  images.  This  method  has 
potential  to  decrease  the  processing  time  dramatically.  By  halving  both  dimensions 
of  the  images  used  in  the  held  work  (1200  x  1600  to  600  x  800),  the  processing  time 
could  potentially  be  reduced  by  3/4.  A  thorough  analysis  on  the  impact  to  precision 
would  determine  the  viability  in  this  approach.  Most  likely  the  largest  impact  would 
be  to  the  farther  distances,  the  pre-contact  and  contact  position  would  not  increase  in 
error  dramatically.  Incorporating  a  secondary  technique  for  further  distances  was  also 
suggested  for  improving  errors  from  the  zFar  clipping  plane  and  resolvable  distances 
and  could  also  compensate  for  the  smaller  image  size  impact  at  further  distances. 

The  next  possible  improvement,  with  the  least  impact  on  the  theory  presented, 
is  to  create  a  better  transfer  process.  With  a  better  understanding  of  the  inner 
workings  of  both  libraries  and  the  video  processing  hardware,  a  more  efficient  method 
should  be  possible.  The  OpenGL  process  was  built  for  speed,  and  getting  an  image 
to  screen  is  the  priority.  If  the  renderings  could  be  re-directed  to  a  more  appropriate 
memory  location,  accessible  by  OpenCV,  the  true  limit  to  the  number  of  rendered 
images  would  mainly  be  from  hardware  capacity. 

Finally,  by  changing  the  coordinate  frame  to  the  n-frame,  the  benign  nature  of 
AR  presented  in  Chapter  3  can  be  exploited.  This  method  was  used  in  other  research 
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efforts  [29],  and  is  probably  a  more  effective  coordinate  frame  for  this  navigation. 
In  this  frame,  the  tanker  is  less  dynamic,  the  required  measurement  updates  can  be 
reduced,  permitting  a  longer  processing  time  to  make  measurement  updates. 

Other  improvements  to  the  process  have  been  discussed  in  the  thesis:  access  the 
full  potential  of  OpenGL  lighting,  determine  a  better  image  processing  technique  when 
the  environmental  model  does  not  represent  the  real-world  well  enough  (full  sun  in 
FOV),  create  a  standardized  validation  process  similar  to  a  camera  calibration  process, 
and  investigate  the  use  of  spherical  coordinates  in  the  cam-frame  to  implement  the 
template-matching  theory.  With  the  spherical  coordinate  system,  the  distance  from 
the  camera  origin  can  be  adjusted  without  movement  to  the  two  angles  that  make  up 
the  position. 

With  these  improvements,  the  theory  presented  here  can  make  improvements  to 
the  current  pose  estimation  with  a  point  tracking  process.  With  improving  processing 
speed  and  a  maturing  method,  the  RIPE  process  can  be  applied  to  a  wide  variety  of 
pose  estimation  problems. 

6.2.2  Alternate  Applications  and  Adaptations.  With  the  full  capability  of 
the  theoretical  RIPE  process,  many  applications  including  AAR  could  beneht. 

Within  OpenGL,  multiple  cameras  can  be  dehned  with  different  viewing  angles 
and  positions  from  a  rendered  object.  Incorporating  a  two  (or  more)  camera  cluster 
with  the  RIPE  process  could  decrease  many  of  the  errors  presented.  With  the  cameras 
separated  by  a  large  enough  distance,  determining  the  image  location  could  potentially 
bypass  the  sizing  group  completely.  This  could  be  accomplished  through  epi-polar 
line  estimation,  or  triangulating  the  objects  position,  without  the  need  to  determine 
its  distance  from  either  camera.  Two  or  more  cameras  with  known  position  and 
orientation  with  each  other  can  improve  the  RIPE  process  by  reducing  the  number  of 
rendered  images  required,  while  improving  accuracy. 

Also  within  the  OpenGL  capabilities  is  rendering  multiple  objects,  including  rep¬ 
resenting  multiple  objects  that  are  connected  (articulated).  Multiple-object  OpenGL 
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scenes  can  represent  dynamic  environments  accounting  for  each  object’s  additional 
degrees  of  freedom,  including  objects  that  are  attached  to  a  single  aircraft.  Objects 
such  as  flight  controls  or  the  tanker  boom  or  drogue  can  be  represented  in  OpenGL 
and  their  state  information  determined  through  a  parallel  RIPE  process.  With  the 
knowledge  of  a  boom’s  location  and  extension,  the  visual  representation  of  the  tanker 
is  improved  and  the  pose  on  the  aircraft  is  more  accurate.  The  work  in  [6]  demon¬ 
strated  articulated  objects  in  both  single  and  multiple  camera  arrangements  with  a 
wire  frame  model  of  an  object.  While  this  did  not  utilize  OpenGL,  many  of  the 
same  principles  apply.  Such  a  process  would  increase  the  number  of  rendered  images 
required  and  many  process  improvements  would  be  required  to  operate  it  real-time. 

Other  adaptations  of  the  RIPE  process  include  different  types  of  cameras  (in¬ 
frared,  3D,  fisheye  lens),  applications  (manufacturing,  space  vehicles,  missile  defense), 
or  model  types  (simple  or  symmetric).  A  different  kind  of  camera  or  lens,  such  as  an 
infrared  combination,  would  permit  operations  in  night  and  in  poor  visibility.  Man¬ 
ufacturing  applications  could  further  tailor  the  quick-RIPE  method,  reducing  the 
needed  rendered  images  for  faster  operations.  Objects  on  an  assembly  line,  for  ex¬ 
ample,  might  only  have  two  DOFs  to  make  perturbations  about.  Highly-symmetric 
object  applications  (such  as  tracking  missiles)  could  make  use  of  a  priori  collected 
photos,  instead  of  renderings,  of  the  object  with  perturbations  in  only  one  or  two  axis. 
This  would  create  a  database  of  images  of  the  object  in  these  DOFs.  This  database, 
in  conjunction  with  a  digital  zoom,  could  bypass  the  rendering  and  transferring  of 
images  real-time  completely. 

With  the  current  processing  capabilities  of  contemporary  computers,  areas  that 
could  beneht  immediately  would  have  fewer  DOFs  of  the  object (s)  in  the  scene  or  an 
object  with  limited  dynamic  motion  potential.  For  example,  land  or  naval-based  nav¬ 
igation  applications  would  operate  faster  with  a  decreased  number  of  rendered  images 
because  of  the  limited  capability  of  objects  in  these  environments.  These  environments 
would  have  fewer  DOFs,  fewer  required  perturbation  images,  and  therefore  quicker 
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processing.  In  conclusion,  the  current  limitations  notwithstanding  many  applications 
requiring  pose  could  be  improved  with  a  RIPE-type  process. 

6. 3  Summary 

The  science  of  image-aided  navigation  is  still  in  its  infancy.  Many  methods 
and  alternatives  provide  specihc  solutions  to  specihc  problems  as  a  balance  of  each 
method’s  benefits  and  weaknesses.  As  a  viable  method,  the  theory  of  this  thesis  has 
been  proven  adequate  in  some  areas  and  weak  in  others.  As  an  approach  to  AAR,  the 
theory  has  identihed  those  aspects  that  limit  its  full  capability.  With  the  proposed 
improvements,  an  efficient  RIPE  process  can  serve  as  augmentation  and  backup  to 
DGPS  in  solving  not  only  the  AAR  problem,  but  pose  in  general. 
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